Re: [controller-dev] [mdsal-dev] 答复: Is Read from follower shard ok and openflowplugin master must be shard leader?

2019-05-31 Thread Tom Pantelis
On Fri, May 31, 2019 at 10:05 AM Robert Varga  wrote:

>
> > #Q11. An application have one instance in every node, how does ODL
> > decide which application instance to handle data change notification and
> > data listener?
>
> Depends on application integration.
>
> For DataTreeChangeListener, only local events are delivered. For
> ClusteredDataTreeChangeListener, all events are delivered.


> DTCLs are delivered to all registrants.


A DataTreeChangeListener is only notified once in the cluster, ie only the
DTCL on the current leader node. A ClusteredDataTreeChangeListener is
notified on every node.


>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Help regarding Yang 1.1 actions on RestConf.

2019-05-27 Thread Tom Pantelis
On Mon, May 27, 2019 at 8:06 PM Ajay Deep Singh <
ajay.deep.si...@ericsson.com> wrote:

> Hi Everyone,
>
>
>
> I am facing  issue over Restconf for testing new code added by me to
> support Yang 1.1 Action please can someone help me out here.
>
>
>
> I can find Old restconf implementation is mapped to /*restconf*, and new
> restconf implementation is mapped to /*rests/data*.
>
>
>
> So I tried adding my simulated Node to Odl using below URI it didn’t work
> can someone highlight what am missing here, I have installed relevant
> features of restconf, netconf, mdsal still not even single end point in
> visible over Rest Api doc having http://{{odl_ip}}:8181/rests/data/ in
> request URL as in pic below.
>
>
>
> For example, mounted device using new RESTCONF with PUT operation in this
> way:
>
> PUT*
> :http://127.0.1:8181/rests/data/network-topology:network-topology/topology/topology-netconf/node/pnf-simulator
> 
> *
>
> Adding KeyStore :
>
> POST*  : 
> **http://localhost:8181/rests/data/netconf-keystore:add-keystore-entry
>  *
>
> Adding Private Key:
>
> POST : http://localhost:8181/rests/data/netconf-keystore:add-private-key
>
>
>
> None of the above Rest request were successful any guidance will be really
> helpful.
>

Did you install the odl-restconf-nb-rfc804 feature?


>
> Thanks for your time.
>
> Regards,
>
> Ajay
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] many logs about log index out-of-sync

2019-01-09 Thread Tom Pantelis
On Wed, Jan 9, 2019 at 4:17 AM Robert Varga  wrote:

> On 09/01/2019 01:41, Tom Pantelis wrote:
> > > Those messages can happen if a follower gets behind the leader,
> > especially if it gets isolated and AppendEntries
> > > messages from the leader get backed up and then the follower gets
> > a bunch of messages quickly when it reconnects to the
> > > leader. Looks like it eventually caught up to the leader although,
> > from the output, the distro you're testing with is
> > > missing https://git.opendaylight.org/gerrit/#/c/78929/ which
> > should speed up the sync process in that case and alleviate
> > > most of those messages.
> >
> > Thanks Tom, makes sense then as this is on one controller that came
> > up last after
> > bouncing them all.
> >
> >
> > yeah it eventually synced but it went down an inefficient path that
> > resulted in 1176 messages from the leader to sync it. Actually that
> > patch I mentioned above was for a different case and wouldn't help in
> > this case. I have an idea where we can optimize it to eliminate those
> > extra messages and make it faster.
>
> https://git.opendaylight.org/gerrit/79334 should help with that (sorry,
> it got lost for a year).
>

yup - that's basically what I was going to do :)


>
> Regards,
> Robert
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] many logs about log index out-of-sync

2019-01-08 Thread Tom Pantelis
On Tue, Jan 8, 2019 at 5:40 PM Jamo Luhrsen  wrote:

> I am debugging a 3node (cluster) netvirt csit job and one of my controllers
> has a bunch of the messages like this:
>
> 2019-01-03T03:04:38,335 | INFO  |
> opendaylight-cluster-data-shard-dispatcher-21 | Shard
>   | 229
> - org.opendaylight.controller.sal-clustering-commons - 1.8.2 |
> member-3-shard-default-config (Follower): The log is not
> empty but the prevLogIndex 19047 was not found in it - lastIndex: 17875,
> snapshotIndex: -1
>
> 2019-01-03T03:04:38,335 | INFO  |
> opendaylight-cluster-data-shard-dispatcher-21 | Shard
>   | 229
> - org.opendaylight.controller.sal-clustering-commons - 1.8.2 |
> member-3-shard-default-config (Follower): Follower is
> out-of-sync so sending negative reply: AppendEntriesReply [term=23,
> success=false,
> followerId=member-3-shard-default-config, logLastIndex=17875,
> logLastTerm=4, forceInstallSnapshot=false,
> payloadVersion=9, raftVersion=3]
>
> log:
>
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-3node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-fluorine/168/odl_3/odl3_karaf.log.gz
>
> The job has lots of csit failures, so I know something is broken somewhere.
> I don't know if the above has anything to do with it or not. Maybe it's
> even expected in this scenario.
>

Those messages can happen if a follower gets behind the leader, especially
if it gets isolated and AppendEntries messages from the leader get backed
up and then the follower gets a bunch of messages quickly when it
reconnects to the leader. Looks like it eventually caught up to the leader
although, from the output, the distro you're testing with is missing
https://git.opendaylight.org/gerrit/#/c/78929/ which should speed up the
sync process in that case and alleviate most of those messages.


>
> Thanks,
> JamO
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] fluorine-sr1 respin build is finished and successful

2018-11-18 Thread Tom Pantelis
On Sun, Nov 18, 2018 at 7:35 AM Ariel Adam  wrote:

> Hi everyone.
> To complete the Fluorine SR1 #273 sign off process we are still missing
> inputs for the controller and Openflowplugin.
>
>- *Tom*, can you take a look at the controller?
>
>
There's 3 tests failing and they've been failing for a while now (as far
back as the saved runs). Same tests have been failing in neon too but not
in Oxygen. I don't know why off the top of my head - I'll have to take some
time to study what the tests do etc.  But this is related to basic
functionality (RPCs) and there's been no real changes in quite a while
(especially since Fluorine release)  so I suspect it's something with the
tests. Maybe Luis or Jamo know more about the history... I set them to
IGNORE (for now at least)



>
>-
>- *Luis/Anil*, can you take a look at the Openflowplugin?
>
> Let's close this release since we are already super late.
>
> Thanks.
>
>
> On Thu, Nov 15, 2018 at 6:51 PM Jamo Luhrsen  wrote:
>
>>
>>
>> On 11/15/18 8:44 AM, Jamo Luhrsen wrote:
>> >
>> >
>> > On 11/15/18 7:55 AM, Robert Varga wrote:
>> >> On 15/11/2018 14:43, Sam Hague wrote:
>> >>> The fluorine-sr1 respin is finished. What is left to promote it? Do we
>> >>> need to redo the test signoff or can the previous stand? The only
>> >>> patches were the netvirt one and the distro-check. Netvirt is isolated
>> >>> to just netvirt so it is good.
>> >>
>> >> Oh man, it would have been good to know that a respin is happening --
>> >> https://git.opendaylight.org/gerrit/1 is a no-brainer to include.
>> >>
>> >> A respin requires a re-signoff, just in case the test runs found
>> >> something the previous one did not, methinks.
>> >
>> > I agree.
>>
>> the list of jobs to sign-off is smaller this time. I created a new
>> tab "SR1-respin1 Status":
>>
>>
>> https://docs.google.com/spreadsheets/d/1wtT78KigRQdRi3Gj--jOJJPI7tC4tIi5brmW01vvCsg/edit#gid=932344730
>>
>> those results are from this distro-test:
>>
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/integration-distribution-test-fluorine/273
>>
>> which used this distribution:
>>
>> https://nexus.opendaylight.org/content/repositories//autorelease-2491/org/opendaylight/integration/karaf/0.9.1/karaf-0.9.1.zip
>>
>> Thanks,
>> JamO
>>
>> >> Regards,
>> >> Robert
>> >>
>> >>
>> >> ___
>> >> release mailing list
>> >> rele...@lists.opendaylight.org
>> >> https://lists.opendaylight.org/mailman/listinfo/release
>> >>
>> ___
>> release mailing list
>> rele...@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/release
>>
> ___
> release mailing list
> rele...@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/release
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Entity ownership for non-voting member

2018-11-08 Thread Tom Pantelis
On Thu, Nov 8, 2018 at 4:48 PM Ajay Lele  wrote:

>
>
> On Thu, Nov 8, 2018 at 4:32 AM Tom Pantelis  wrote:
>
>>
>>
>> On Wed, Nov 7, 2018 at 9:28 PM Tom Pantelis 
>> wrote:
>>
>>>
>>>
>>> On Wed, Nov 7, 2018 at 3:36 PM Robert Varga  wrote:
>>>
>>>> On 07/11/2018 20:36, Ajay Lele wrote:
>>>> >
>>>> >
>>>> > On Wed, Nov 7, 2018 at 3:58 AM Robert Varga >>> > <mailto:n...@hq.sk>> wrote:
>>>> >
>>>> > On 07/11/2018 02:07, Ajay Lele wrote:
>>>> > > Hi Controller-devs,
>>>> > >
>>>> > > [0] changed EOS behavior such that a non-voting member cannot
>>>> become
>>>> > > entity owner. But there are some situations where we want to
>>>> allow
>>>> > this
>>>> > > e.g. when BGP speaker config is local to node and not
>>>> replicated in
>>>> > > cluster. I think the behavior should be that non-voting member
>>>> > will not
>>>> > > become entity owner *provided* one or more voting candidates are
>>>> > > available. Thoughts?
>>>> >
>>>> > This is a sticky topic. Non-voting members are meant to be used
>>>> for
>>>> > geo-redundancy only, in which case non-voting side should be
>>>> really
>>>> > passive.
>>>> >
>>>> > Can you describe the deployment scenario in more detail?
>>>> >
>>>> >
>>>> > One scenario is BGP route injection using application RIB. Injected
>>>> > route will be withdrawn when BGP connection over which it is
>>>> advertised
>>>> > goes down. To prevent downtime window when primary to DR cutover
>>>> > happens, BGP connection is established from non-voting member from DR
>>>> > site as well. This is accomplished by moving BGP speaker data to a
>>>> > separate shard which is not replicated.
>>>>
>>>> I see, so this is used for planned maintenance only, right?
>>>>
>>>
> Same situation for unplanned as well, as as soon as primary goes down the
> routes will be withdrawn, so we need to have a separate connection from DR
> in parallel
>
>
>>>> If that is the case, I think it would make sense to make the selection
>>>> policy switchable at runtime, such that under normal operation only
>>>> voting members are considered, but in preparation for a switchover, the
>>>> policy can be adjusted. Can you file an improvement issue against
>>>> controller, please?
>>>
>>>
>>>> Tom: do you have an opinion on how best implement this? It feels like we
>>>> want to have a per-entity behavior table, so that BGP would switch to
>>>> non-voting, but other services would not...
>>>>
>>>
>>>
> One option I see is to give preference to voting members but if none
> exists, give ownership to non-voting one. This will have to be reevaluated
> every time the candidate list changes. Not giving ownership to a
> (non-voting) candidate when no other candidate exists, sounds extreme to
> me. What do you think?
>

The problem with that is that it doesn't know what other candidates exist
at the time the decision is made. It would work in your case assuming
there's only the one local entity candidate in the entire 6-node cluster.
For others, it may be that a non-voting candidate arrives first then a
voting one, in which case ownership would get revoked. That is not ideal
and could lead to quite bit of churn across all entities before it finally
settles out.

Therefore it seems there would need to be some setting per entity or entity
type as Robert suggested. However this BGP use case just seems like it goes
against the premise of the EOS. The EOS is intended to maintain one entity
owner across a clustered environment. However in your case, it seems you
essentially want to bypass this behavior and really use EOS to enable
functionality per node, ie you really want to treat it as if in a
non-clustered/single-node environment where the local candidate always gets
ownership, and which is why the BGP data shard is non-replicated. So that's
why I suggested the EOS not even be used in this case rather than trying to
shim changes in to accommodate a (seemingly) incompatible use case. That's
my opinion



> Actually voting status is being considered for EOS shard, where as in BGP
> case we create a separate shard for BGP dat

Re: [controller-dev] Entity ownership for non-voting member

2018-11-08 Thread Tom Pantelis
On Wed, Nov 7, 2018 at 9:28 PM Tom Pantelis  wrote:

>
>
> On Wed, Nov 7, 2018 at 3:36 PM Robert Varga  wrote:
>
>> On 07/11/2018 20:36, Ajay Lele wrote:
>> >
>> >
>> > On Wed, Nov 7, 2018 at 3:58 AM Robert Varga > > <mailto:n...@hq.sk>> wrote:
>> >
>> > On 07/11/2018 02:07, Ajay Lele wrote:
>> > > Hi Controller-devs,
>> > >
>> > > [0] changed EOS behavior such that a non-voting member cannot
>> become
>> > > entity owner. But there are some situations where we want to allow
>> > this
>> > > e.g. when BGP speaker config is local to node and not replicated
>> in
>> > > cluster. I think the behavior should be that non-voting member
>> > will not
>> > > become entity owner *provided* one or more voting candidates are
>> > > available. Thoughts?
>> >
>> > This is a sticky topic. Non-voting members are meant to be used for
>> > geo-redundancy only, in which case non-voting side should be really
>> > passive.
>> >
>> > Can you describe the deployment scenario in more detail?
>> >
>> >
>> > One scenario is BGP route injection using application RIB. Injected
>> > route will be withdrawn when BGP connection over which it is advertised
>> > goes down. To prevent downtime window when primary to DR cutover
>> > happens, BGP connection is established from non-voting member from DR
>> > site as well. This is accomplished by moving BGP speaker data to a
>> > separate shard which is not replicated.
>>
>> I see, so this is used for planned maintenance only, right?
>>
>> If that is the case, I think it would make sense to make the selection
>> policy switchable at runtime, such that under normal operation only
>> voting members are considered, but in preparation for a switchover, the
>> policy can be adjusted. Can you file an improvement issue against
>> controller, please?
>
>
>> Tom: do you have an opinion on how best implement this? It feels like we
>> want to have a per-entity behavior table, so that BGP would switch to
>> non-voting, but other services would not...
>>
>
> yeah that seems like one solution. However why is BGP even using the EOS
> in that case to establish a connection, ie if every node is intended to
> establish its own connection then it seems you don't need EOS to decide
> that...
>

BTW, this is not just for planned maintenance but also for DR should the
entire primary site become unavailable. Either way I don't think changing
some setting dynamically prior to fail-over (and fail-back) is a viable
solution - it would be some permanent setting at installation/deployment
time, similar to the configuration to put the BGP data in a local,
non-replicated shard.  So perhaps there could be an additional "clustered"
setting in the BGP module that tells it whether or not to use EOS.


>
>
>>
>> Regards,
>> Robert
>>
>>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Entity ownership for non-voting member

2018-11-07 Thread Tom Pantelis
On Wed, Nov 7, 2018 at 3:36 PM Robert Varga  wrote:

> On 07/11/2018 20:36, Ajay Lele wrote:
> >
> >
> > On Wed, Nov 7, 2018 at 3:58 AM Robert Varga  > > wrote:
> >
> > On 07/11/2018 02:07, Ajay Lele wrote:
> > > Hi Controller-devs,
> > >
> > > [0] changed EOS behavior such that a non-voting member cannot
> become
> > > entity owner. But there are some situations where we want to allow
> > this
> > > e.g. when BGP speaker config is local to node and not replicated in
> > > cluster. I think the behavior should be that non-voting member
> > will not
> > > become entity owner *provided* one or more voting candidates are
> > > available. Thoughts?
> >
> > This is a sticky topic. Non-voting members are meant to be used for
> > geo-redundancy only, in which case non-voting side should be really
> > passive.
> >
> > Can you describe the deployment scenario in more detail?
> >
> >
> > One scenario is BGP route injection using application RIB. Injected
> > route will be withdrawn when BGP connection over which it is advertised
> > goes down. To prevent downtime window when primary to DR cutover
> > happens, BGP connection is established from non-voting member from DR
> > site as well. This is accomplished by moving BGP speaker data to a
> > separate shard which is not replicated.
>
> I see, so this is used for planned maintenance only, right?
>
> If that is the case, I think it would make sense to make the selection
> policy switchable at runtime, such that under normal operation only
> voting members are considered, but in preparation for a switchover, the
> policy can be adjusted. Can you file an improvement issue against
> controller, please?
>
> Tom: do you have an opinion on how best implement this? It feels like we
> want to have a per-entity behavior table, so that BGP would switch to
> non-voting, but other services would not...
>

yeah that seems like one solution. However why is BGP even using the EOS in
that case to establish a connection, ie if every node is intended to
establish its own connection then it seems you don't need EOS to decide
that...


>
> Regards,
> Robert
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] Autorelease oxygen failed to build sal-distributed-datastore from controller

2018-11-07 Thread Tom Pantelis
On Wed, Nov 7, 2018 at 9:05 PM Thanh Ha 
wrote:

> On Thu, Nov 8, 2018 at 9:45 AM Robert Varga  wrote:
>
>> On 08/11/2018 02:09, Jenkins wrote:
>> > Attention controller-devs,
>> >
>> > Autorelease oxygen failed to build sal-distributed-datastore from
>> controller in build
>> > 477. Attached is a snippet of the error message related to the
>> > failure that we were able to automatically parse as well as console
>> logs.
>> >
>> >
>> > Console Logs:
>> >
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/autorelease-release-oxygen/477
>>
>> This is the third time Oxygen autorelease has failed on CDS unit tests,
>> which are (inherently) time-sensitive. Each time it was a different test
>> and while we can increase timeouts
>> (https://git.opendaylight.org/gerrit/77597), these tests have been
>> stable for a long time -- which, coupled with other performance-related
>> questions cropping up, is leading me to ask:
>>
>> What is happening to our build infrastructure lately?
>>
>
> We haven't really made any changes to infrastructure lately. Main things
> are just been occasional upgrades to Jenkins and such.
>
> Are we sizing the VMs differently?
>>
>
> Nope, we are still running on the same centos7-autorelease-8c-32g VM that
> we've been using since July https://git.opendaylight.org/gerrit/74305 due
> to older branches of ODL requiring more resources, and haven't reverted
> that patch since so if anything we are building on an even larger system
> for autorelease than we used to need.
>
> Can we get a statement from our cloud provider, please?
>>
>
> I've CC'd Mo on this thread.
>

It's also interesting that these 3 failures only occurred on Oxygen AR and
not on Fluorine or Neon and there's been no code changes in those areas for
quite some time.


>
> Regards,
> Thanh
>
> ___
> release mailing list
> rele...@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/release
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] Autorelease oxygen failed to build sal-distributed-datastore from controller

2018-11-05 Thread Tom Pantelis
On Mon, Nov 5, 2018 at 12:17 PM Robert Varga  wrote:

> On 05/11/2018 18:14, Sam Hague wrote:
> > Is this any way related to Lori's connection failures in the lisp IT.
> > Likely a stretch but they had connection issues also.
>
> Could be, if the VMs are experiencing hiccups. Then again it might have
> beenm a one-off... Oxygen has been just sitting there for a looong time.
>

yeah this UT failure may have been b/c the timeout deadline (3 sec)
happened to be a bit too short for that particular run due to a hiccup. I
always give UT deadlines at least a 5 sec cushion for that reason even tho
it *should* normally take on the order of millisec.


> Regards,
> Robert
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] Autorelease oxygen failed to build sal-distributed-datastore from controller

2018-11-05 Thread Tom Pantelis
On Mon, Nov 5, 2018 at 5:35 AM Michael Vorburger 
wrote:

> On Mon, Nov 5, 2018 at 2:07 AM Jenkins 
> wrote:
>
>> Attention controller-devs,
>>
>> Autorelease oxygen failed to build sal-distributed-datastore from
>> controller in build
>> 474. Attached is a snippet of the error message related to the
>> failure that we were able to automatically parse as well as console logs.
>>
>>
>> Console Logs:
>>
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/autorelease-release-oxygen/474
>>
>> Jenkins Build:
>>
>> https://jenkins.opendaylight.org/releng/job/autorelease-release-oxygen/474/
>
>
> Results :
>
> Failed tests:
>   ModuleShardBackendResolverTest.testRefreshBackendInfo:138 assertion
> failed: timeout (3 seconds) during expectMsgClass waiting for class
> org.opendaylight.controller.cluster.access.commands.ConnectClientRequest
>

I have never seen this test fail. I don't have much familiarity with this
code (Robert wrote it) and I wouldn't be able to look into it till later
next week at the earliest. Hopefully it doesn't somehow become frequent now
- AFAIK there's been no changes in quite some time so most likely it's a
rare timing issue in the UT.


>
> Tests run: 866, Failures: 1, Errors: 0, Skipped: 6
>
> https://jira.opendaylight.org/browse/CONTROLLER-1868
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] aaa build failure in proposed topic:neon-mri ("Weather Item" TSC-132)

2018-09-23 Thread Tom Pantelis
On Sun, Sep 23, 2018 at 11:34 AM Michael Vorburger 
wrote:

> On Sun, Sep 23, 2018 at 3:21 PM Tom Pantelis 
> wrote:
>
>> On Sun, Sep 23, 2018 at 7:26 AM Michael Vorburger 
>> wrote:
>>
>>> Dear maintainers of project aaa,
>>>
>>> While verifying the proposed cross-projects changes on managed
>>> topic:neon-mri together, your project failed to build; please see
>>> https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-neon/26/console
>>> .
>>>
>>> IMHO this is blocking topic:neon-mri / TSC-132 and one of us should see
>>> how we can sort this out:
>>>
>>> Running org.opendaylight.odlparent.featuretest.SingleFeatureTest
>>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 315.015
>>> sec <<< FAILURE! - in
>>> org.opendaylight.odlparent.featuretest.SingleFeatureTest
>>> installFeatureCatchAndLog(org.opendaylight.odlparent.featuretest.SingleFeatureTest)[repoUrl:
>>> file:/w/workspace/integration-multipatch-test-neon/patch_tester/aaa/features/odl-aaa-password-service/target/feature/feature.xml,
>>> Feature: odl-aaa-password-service 0.9.0.SNAPSHOT]  Time elapsed: 314.722
>>> sec  <<< ERROR!
>>> org.awaitility.core.ConditionTimeoutException: Condition with alias
>>> 'checkBundleDiagInfos' didn't complete within 300 seconds because lambda
>>> expression in org.opendaylight.odlparent.bundlestest.lib.TestBundleDiag:
>>> expected system either ready with all bundles Active, or Stopping or
>>> Failure (but not still booting in GracePeriod, Waiting, Starting,
>>> Unknown;but just Resolved and some exceptional Installed OK) but was >> Booting {Installed=0, Resolved=5, Unknown=0, GracePeriod=1, Waiting=0,
>>> Starting=0, Active=101, Stopping=0, Failure=0}
>>> 1. NOK org.opendaylight.aaa.password-service-impl:0.9.0.SNAPSHOT: OSGi
>>> state = Active, Karaf bundleState = GracePeriod, due to: Blueprint
>>> 9/23/18 10:55 AM
>>> Missing dependencies:
>>>
>>> (&(objectClass=org.apache.aries.blueprint.NamespaceHandler)(osgi.service.blueprint.namespace=
>>> http://opendaylight.org/xmlns/blueprint/v1.0.0))
>>> >.
>>>
>>
>> This is b/c https://git.opendaylight.org/gerrit/#/c/74964/ moved the
>> aaa-password-service BP xml file under OSGI-INF/blueprint. However the
>> feature does not pull in the ODL blueprint bundles, either directly or
>> indirectly via odl-mdsal-broker-local.
>> So it either needs to pull in odl-mdsal-broker-local or we create a
>> feature for the ODL blueprint bundle. For the short-term, that patch
>> doesn't need to
>> move  the BP xml file for the MRI version bumps so we could put it back
>> under org/opendaylight/blueprint for now and address it in another patch.
>>
>
> I see. For the very short-term and to unblock topic:neon-mri (I'm curious
> to see how far we can get the multipatch job to progress by all working
> together this week!) I agree and too would go for the latter and leave
> them in org/opendaylight/blueprint instead of moving them to
> OSGI-INF/blueprint in c/74964 (NB not just password-service-blueprint.xml
> but all BP XML).
>
> Robert, as the author of c/74964 would you like to amend it to do so? If
> you don't have time but confirm that you agree this is what should be done,
> then I'm happy to do this as well, in order to unblock.
>
> But given that we want to converge on OSGI-INF/blueprint (and explicitly
> ask projects in the migration documentation on
> https://wiki.opendaylight.org/view/Neon_platform_upgrade#Blueprint_declarations
> ...) I think it would be useful to do this uniformely soon-ish, so let's
> make a plan for that as well, in parallel to fixing the short term?
>

I started that a while ago in mdsal
https://git.opendaylight.org/gerrit/#/c/75528/. But it needed odlparent
4.0.0 to remove the Import{Export}-Service headers. I also have a
controller draft patch to follow. However that is running into the same
issue with the missing ODL BP NamespaceHandler. It's interesting that we
didn't see an issue before with the files under org/opendaylight/blueprint
b/c there was no BP extender triggered to try to load them, which wasn't
good b/c we weren't actually testing the BP wiring during SFT.


> I should be easy enough to raise a change in controller to have a new
> odl-blueprint feature if that's what we want (and I'm happy to), but... do
> we really want to? Would you then want to add that explicitly to
> odl-aaa-password-service, and elsewhere where we hit this problem? I
> don't really understand how it's possible for a bundle to u

Re: [controller-dev] aaa build failure in proposed topic:neon-mri ("Weather Item" TSC-132)

2018-09-23 Thread Tom Pantelis
On Sun, Sep 23, 2018 at 7:26 AM Michael Vorburger 
wrote:

> Dear maintainers of project aaa,
>
> While verifying the proposed cross-projects changes on managed
> topic:neon-mri together, your project failed to build; please see
> https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-neon/26/console
> .
>
> IMHO this is blocking topic:neon-mri / TSC-132 and one of us should see
> how we can sort this out:
>
> Running org.opendaylight.odlparent.featuretest.SingleFeatureTest
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 315.015
> sec <<< FAILURE! - in
> org.opendaylight.odlparent.featuretest.SingleFeatureTest
> installFeatureCatchAndLog(org.opendaylight.odlparent.featuretest.SingleFeatureTest)[repoUrl:
> file:/w/workspace/integration-multipatch-test-neon/patch_tester/aaa/features/odl-aaa-password-service/target/feature/feature.xml,
> Feature: odl-aaa-password-service 0.9.0.SNAPSHOT]  Time elapsed: 314.722
> sec  <<< ERROR!
> org.awaitility.core.ConditionTimeoutException: Condition with alias
> 'checkBundleDiagInfos' didn't complete within 300 seconds because lambda
> expression in org.opendaylight.odlparent.bundlestest.lib.TestBundleDiag:
> expected system either ready with all bundles Active, or Stopping or
> Failure (but not still booting in GracePeriod, Waiting, Starting,
> Unknown;but just Resolved and some exceptional Installed OK) but was  Booting {Installed=0, Resolved=5, Unknown=0, GracePeriod=1, Waiting=0,
> Starting=0, Active=101, Stopping=0, Failure=0}
> 1. NOK org.opendaylight.aaa.password-service-impl:0.9.0.SNAPSHOT: OSGi
> state = Active, Karaf bundleState = GracePeriod, due to: Blueprint
> 9/23/18 10:55 AM
> Missing dependencies:
>
> (&(objectClass=org.apache.aries.blueprint.NamespaceHandler)(osgi.service.blueprint.namespace=
> http://opendaylight.org/xmlns/blueprint/v1.0.0))
> >.
>

This is b/c https://git.opendaylight.org/gerrit/#/c/74964/ moved the
aaa-password-service BP xml file under OSGI-INF/blueprint. However the
feature does not pull in the ODL blueprint bundles, either directly or
indirectly via odl-mdsal-broker-local.
So it either needs to pull in odl-mdsal-broker-local or we create a feature
for the ODL blueprint bundle. For the short-term, that patch doesn't need to
move  the BP xml file for the MRI version bumps so we could put it back
under org/opendaylight/blueprint for now and address it in another patch.


>
> Yours sincerely,
> M. for the ODL Bot 
>
>
> https://git.opendaylight.org/gerrit/#/q/topic:neon-mri
>
>
> https://jenkins.opendaylight.org/releng/view/integration/job/integration-multipatch-test-neon/26/console
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] owner changed failure

2018-09-19 Thread Tom Pantelis
On Wed, Sep 19, 2018 at 4:46 PM Jamo Luhrsen  wrote:

> \
> > Anyway we'll need to enable debug
> for org.opendaylight.controller.cluster.datastore.entityownership.  I would
> suggest to
> > pull out that test on its own like you've done before (run it standalone
> in sandbox I guess). Also delete the log files
> > in between each run. This will make debugging much easier.
>
> Is there not enough info in the existing logs?


No - not with the default INFO logging. In order to dig deeper we need to
enable targeted debug, in this
case org.opendaylight.controller.cluster.datastore.entityownership.


> You can easily trim the
> karaf logs based on the test cases. We are logging the start of every
> suite and test case.
>
>
I think it's  much easier and faster to debug a failing test if it's
isolated. Of course the logs are much smaller and don't require trimming.
Enabling debug can result in huge logs even just running one test, let
alone the whole batch of them. Also I assume this test fails sporadically
which means it needs to be run over and over. Doing that with the entire
job will take a long time. But if it's too much of a pain to isolate the
test, then OK.
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL Cassandra Persistence

2018-09-18 Thread Tom Pantelis
On Tue, Sep 18, 2018 at 5:12 PM Michael Vorburger 
wrote:

> On Tue, 18 Sep 2018, 23:04 sat,  wrote:
>
>> Hi,
>>
>> Yes, we were looking for a project like this. Unfortunately the project
>> is discontinued.
>>
>
> https://github.com/akka/akka-persistence-cassandra seems to be active?
>

yeah that's the right one - sorry. There's a bunch of akka persistence
plugins out there. Try it out and see if it works for you.


>
> Thanks
>> A.SathishKumar
>>
>> On Tue, Sep 18, 2018 at 6:54 AM Tom Pantelis 
>> wrote:
>>
>>>
>>>
>>> On Mon, Sep 17, 2018 at 11:28 PM sat  wrote:
>>>
>>>> Hi Michael Vorburger,
>>>>
>>>> Thanks, i will check it out.
>>>>
>>>> Thanks
>>>> A.SathishKumar
>>>>
>>>
>>>
>>> There is an akka persistence plugin for Cassandra -
>>> https://github.com/krasserm/akka-persistence-cassandra.  I think this
>>> is what you're looking for.
>>>
>>>
>>>>
>>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL Cassandra Persistence

2018-09-18 Thread Tom Pantelis
On Mon, Sep 17, 2018 at 11:28 PM sat  wrote:

> Hi Michael Vorburger,
>
> Thanks, i will check it out.
>
> Thanks
> A.SathishKumar
>


There is an akka persistence plugin for Cassandra -
https://github.com/krasserm/akka-persistence-cassandra.  I think this is
what you're looking for.


>
> On Mon, Sep 17, 2018 at 3:13 PM Michael Vorburger 
> wrote:
>
>>
>> Sat,
>>
>> On Thu, Sep 13, 2018 at 2:07 AM sat  wrote:
>>
>>> Hi,
>>>
>>> ODL uses "LevelDB" for persistence, we came to know that its prone for
>>> corruption. Did anyone try using Cassandra for persistence rather than
>>> LevelDB.
>>>
>>> I see some posts with the same requirement, but there is no reply.
>>>
>>
>> https://pantheon.tech/cassandra-datastore/ is a blog post which may
>> interest you in this context; it's from a company that I am not affiliated
>> with (and won't be able to further comment on here).
>>
>> BTW: https://github.com/vorburger/opendaylight-etcd is somewhat related
>> WIP work in FLOSS where I'm actively exploring the use of etcd (not
>> Cassandra) as a data store.
>>
>> Tx,
>> M.
>> --
>> Michael Vorburger, Red Hat
>> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>>
>>
>
> --
> A.SathishKumar
> 044-24735023
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] controller cluster csit suites for ask or tell or both

2018-08-30 Thread Tom Pantelis
On Thu, Aug 30, 2018 at 5:44 PM, Jamo Luhrsen  wrote:

> Tom, Vratko, or any other expert,
>
> Can you help us figure out this patch:
>
> https://git.opendaylight.org/gerrit/#/c/74692/
>
> The goal is to split the suites in to two jobs so we can only have
> an ask based job and a tell based job. It seems that some of the
> suites will work in both, but others are only intended for one or
> the other.
>
> This will give us better clarity as to the stability of each
> protocol over time. I'm guessing this is going to require a little
> digging in to each suite's robot .html files to try and read
> what's going on.
>
> ask based:
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/b
> uilder-copy-sandbox-logs/374/thapar-controller-csit-3node-cl
> ustering-ask-all-oxygen/1/robot-plugin/log.html.gz
>
> tell based:
> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/b
> uilder-copy-sandbox-logs/375/thapar-controller-csit-3node-cl
> ustering-tell-all-oxygen/1/robot-plugin/log.html.gz



From
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/controller-csit-3node-clustering-all-fluorine/185/robot-plugin/log.html.gz,
each test is preceded with "Restart Odl With Tell Based True/False", so I
think we can go by that pattern to start.


>
>
> Thanks,
> JamO
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] trying to repro CONTROLLER-1849 w/ artery

2018-08-28 Thread Tom Pantelis
On Tue, Aug 28, 2018 at 6:50 PM, Jamo Luhrsen  wrote:

> Tom, Robert, others,
>
> the good news is I finally got a few things figured out and I can make
> a local 3 node cluster work with artery (using udp).
>
> The bad news is that I keep running out of shared memory (after not
> too many iterations of my test).
>
> I am launching my docker containers with 5G of /dev/shm, but something
> is eating it up when I'm killing and restarting the controller in a
> single container. Once it's gone, artery craps out and can't send any
> messages, so the test is worthless after that.
>
> I need to find a way to clean/flush whatever's using that memory after
> each iteration, I think.
>
> anyone with any expertise here?
>

I assume this is b/c killing the process doesn't free up the space. You can
set the dir aeron uses via *artery.advanced.aeron-dir*.

# Directory used by the Aeron media driver. It's mandatory to
define the 'aeron-dir'
# if using external media driver, i.e. when 'embedded-media-driver
= off'.
# Embedded media driver will use a this directory, or a temporary
directory if this
# property is not defined (empty).
# Only used when transport is aeron-udp.
aeron-dir = ""

So perhaps set it to a dedicated dir and delete the contents after every
run.

However I would first suggest running the test with graceful shutdown first
to see if that properly cleans up /dev/shm.


>
> Thanks,
> JamO
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] akka w/ artery in docker not working

2018-08-22 Thread Tom Pantelis
On Wed, Aug 22, 2018 at 4:57 PM, Jamo Luhrsen  wrote:

> I am looking for some help/ideas. I am running three containers
> on my laptop so I can test some cluster bugs locally. I am fine
> with the netty based akka remoting (our default), but I've been
> asked to reproduce a bug with the artery remoting.
>
> Every time I start my controllers with artery, they all die with:
>
> ERROR | opendaylight-cluster-data-akka.actor.default-dispatcher-56 |
> ActorSystemImpl  | 41 -
>  com.typesafe.akka.slf4j - 2.5.11 | Uncaught error from thread
> [opendaylight-cluster-data-akka.remote.default-remote-dispatcher-7]: Di
> rect buffer memory, shutting down JVM since 'akka.jvm-exit-on-fatal-error'
> is enabled for ActorSystem[opendaylight-cluster-data]
> java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:695) ~[?:?]
> at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
> ~[?:?]
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[?:?]
> at 
> akka.remote.artery.EnvelopeBufferPool.acquire(EnvelopeBufferPool.scala:34)
> ~[40:com.typesafe.akka.remote:2.5.11]
> at akka.remote.artery.Encoder$$anon$2.onPush(Codecs.scala:93)
> ~[40:com.typesafe.akka.remote:2.5.11]
>
>
> I'm stuck in the mud on this one so far. I'm trying to tweak the
> shared memory setting in the docker run command (to no avail).
>
> The container stats are nowhere near any kind of limit.
>
> It's not happening with netty based.
>
> I've tried both udp and tcp protocols with the same OOM death.
>
> appreciate any ideas or pointers. I'm literally just throwing mud against
> the wall at this point hoping something will stick.
>
> TomP tells me it works on his laptop, but he's not using docker/containers,
> so that must be a clue.
>

The problem is this in configuration/factory/akka.conf:

artery {
advanced {
  maximum-frame-size = 1 GiB
  maximum-large-frame-size = 1 GiB
}
  }

I had set these originally a while ago when we were first looking at artery
before I realized it was using direct memory. I forgot to remove them. I'll
submit a patch to do that - in the mean time you can remove it locally.



>
> Thanks,
> JamO
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] MD-SAL: Startup Project Archetype Seems Stopped Working Since Release Nitrogen

2018-07-24 Thread Tom Pantelis
On Tue, Jul 24, 2018 at 1:05 PM, harry.zh...@us.fujitsu.com <
harry.zh...@us.fujitsu.com> wrote:

> Hi Tom,
>
>
>
> I think that I must get this fixed for Nitrogen release. May I ask whether
> it is possible?
>
>
>

As I mentioned the bundle with the class has to be installed so make sure
it's in a feature you install into karaf.  Remember the archetype is just a
starting point anyway


>
> Best Regards,
>
>
>
> Harry
>
>
>
>
>
> *From:* Michael Vorburger [mailto:vorbur...@redhat.com]
> *Sent:* Tuesday, July 24, 2018 12:01 PM
> *To:* Zhang, Harry 
> *Cc:* Tom Pantelis ; controller-dev <
> controller-dev@lists.opendaylight.org>
>
> *Subject:* Re: [controller-dev] MD-SAL: Startup Project Archetype Seems
> Stopped Working Since Release Nitrogen
>
>
>
> Harry,
>
>
>
> On Tue, Jul 24, 2018 at 6:57 PM harry.zh...@us.fujitsu.com <
> harry.zh...@us.fujitsu.com> wrote:
>
> Hi Tom,
>
>
>
> May I ask if you remember how to fix the problem? I could not find the
> previous record.
>
>
>
> if you are willing to work with bleeding edge (master Fluorine, which is
> about to be frozen and released; in August) then please use the new
> arechetype and its documentation, see https://docs.opendaylight.
> org/en/latest/developer-guide/developing-apps-on-the-
> opendaylight-controller.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.opendaylight.org_en_latest_developer-2Dguide_developing-2Dapps-2Don-2Dthe-2Dopendaylight-2Dcontroller.html=DwMFaQ=09aR81AqZjK9FqV5BSCPBw=lq6I3u0wiARfxjpvxTgWkgOXHPZRN6twp12sL6yfwoc=qlUcpe_NM3cC0O1-pyPASj3kDLHD1aUN_3AEMKhH4gM=VIodsdJ7M8zU5osDQGKZl_8P9wHifgt2nHbHWAg3PUw=>
>
>
>
> Tx,
>
> M.
>
> --
>
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__vorburger.ch_=DwMFaQ=09aR81AqZjK9FqV5BSCPBw=lq6I3u0wiARfxjpvxTgWkgOXHPZRN6twp12sL6yfwoc=qlUcpe_NM3cC0O1-pyPASj3kDLHD1aUN_3AEMKhH4gM=hsDe1ZkE6howrEBxNQXziCIp7nK3fZC0BVZAtf-Se5k=>
>
>
>
> Thanks,
>
>
>
> Harry
>
>
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Tuesday, July 24, 2018 11:43 AM
> *To:* Zhang, Harry 
> *Cc:* controller-dev@lists.opendaylight.org
> *Subject:* Re: [controller-dev] MD-SAL: Startup Project Archetype Seems
> Stopped Working Since Release Nitrogen
>
>
>
>
>
>
>
> On Tue, Jul 24, 2018 at 12:35 PM, harry.zh...@us.fujitsu.com <
> harry.zh...@us.fujitsu.com> wrote:
>
> Hi Team,
>
>
>
> I have been following the procedures, https://wiki.opendaylight.org/
> view/OpenDaylight_Controller:MD-SAL:Startup_Project_Archetype
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.opendaylight.org_view_OpenDaylight-5FController-3AMD-2DSAL-3AStartup-5FProject-5FArchetype=DwMFaQ=09aR81AqZjK9FqV5BSCPBw=lq6I3u0wiARfxjpvxTgWkgOXHPZRN6twp12sL6yfwoc=dfNlHfBik8zrOu8n6LgLAkovupFZzDe3CoTBppdW-F4=UTYenlM54P71KglqRSqCnAcmZoI56uqdIi6ES4qEtlo=>,
> to create projects. It works with Carbon releases. Recently when I tried it
> with Nitrogen releases, the project creation and compilation were fine, but
> when installed with karaf feature:install, the init method of the Provider
> was not called. I tried all three Nitrogen release and they have the same
> problem.
>
>
>
> The generation command that I used,
>
>
>
> mvn archetype:generate -DarchetypeGroupId=org.opendaylight.controller
> -DarchetypeArtifactId=opendaylight-startup-archetype
> -DarchetypeRepository=http://nexus.opendaylight.org/content/repositories/
> opendaylight.releas/
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__nexus.opendaylight.org_content_repositories_opendaylight.releas_=DwMFaQ=09aR81AqZjK9FqV5BSCPBw=lq6I3u0wiARfxjpvxTgWkgOXHPZRN6twp12sL6yfwoc=dfNlHfBik8zrOu8n6LgLAkovupFZzDe3CoTBppdW-F4=9UQBNL9NRvN5MJlUSbqj7lgl7gAyLTtPRIHqfeLTRWc=>
> -DarchetypeCatalog=remote -DarchetypeVersion=1.4.3
>
>
>
> Define value for property 'groupId': : org.opendaylight.example
>
> Define value for property 'artifactId': : example
>
> Define value for property 'package':  org.opendaylight.example: :
>
> Define value for property 'classPrefix':  ${artifactId.substring(0,1).
> toUpperCase()}${artifactId.substring(1)}
>
> Define value for property 'copyright': : Yoyodyne, Inc.
>
>
>
> cd example/
>
> mvn clean install
>
> cd karaf/target/assembly/bin
>
> ./karaf
>
> log:display | grep Example
>
> The last command showed nothing. After I did feature:install and installed
> the Example feature without a problem, it still did not show “ExampleProvider
> Session Initiated”.

Re: [controller-dev] MD-SAL: Startup Project Archetype Seems Stopped Working Since Release Nitrogen

2018-07-24 Thread Tom Pantelis
On Tue, Jul 24, 2018 at 12:35 PM, harry.zh...@us.fujitsu.com <
harry.zh...@us.fujitsu.com> wrote:

> Hi Team,
>
>
>
> I have been following the procedures, https://wiki.opendaylight.org/
> view/OpenDaylight_Controller:MD-SAL:Startup_Project_Archetype, to create
> projects. It works with Carbon releases. Recently when I tried it with
> Nitrogen releases, the project creation and compilation were fine, but when
> installed with karaf feature:install, the init method of the Provider was
> not called. I tried all three Nitrogen release and they have the same
> problem.
>
>
>
> The generation command that I used,
>
>
>
> mvn archetype:generate -DarchetypeGroupId=org.opendaylight.controller
> -DarchetypeArtifactId=opendaylight-startup-archetype
> -DarchetypeRepository=http://nexus.opendaylight.org/content/repositories/
> opendaylight.releas/ -DarchetypeCatalog=remote -DarchetypeVersion=1.4.3
>
>
>
> Define value for property 'groupId': : org.opendaylight.example
>
> Define value for property 'artifactId': : example
>
> Define value for property 'package':  org.opendaylight.example: :
>
> Define value for property 'classPrefix':  ${artifactId.substring(0,1).
> toUpperCase()}${artifactId.substring(1)}
>
> Define value for property 'copyright': : Yoyodyne, Inc.
>
>
>
> cd example/
>
> mvn clean install
>
> cd karaf/target/assembly/bin
>
> ./karaf
>
> log:display | grep Example
>
> The last command showed nothing. After I did feature:install and installed
> the Example feature without a problem, it still did not show “ExampleProvider
> Session Initiated”.
>
>
>
> In the karaf.log, there is no “ExampleProvider Session Initiated” either.
> After that, I added a rpc to the service, when browse the rpc from apidoc
> url, it says the api was not implemented.
>
>
>
> Could somebody tell me whether some steps changed for bringing up a
> feature?
>
>
>
> I redid the creation with Carbon release with exact the same procedure,
> there were no problems.
>
>
>
>
>

This has come up before in previous mailing list discussions and I don't
recall the details but FWIC, it's b/c the bundle that contains the
ExampleProvider isn't installed, either b/c it's not included in a feature
or that feature isn't installed by the generated local karaf distro.


>
>
>
>
> Thanks,
>
>
>
> Harry
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] Autorelease oxygen failed to build sal-cluster-admin-impl from controller

2018-07-20 Thread Tom Pantelis
On Fri, Jul 20, 2018 at 10:54 AM, Daniel Farrell 
wrote:

> On Fri, Jul 20, 2018 at 10:36 AM Thanh Ha 
> wrote:
>
>> On Fri, Jul 20, 2018 at 10:01 AM Tom Pantelis 
>> wrote:
>>
>>> On Fri, Jul 20, 2018 at 4:48 AM, Anil Belur 
>>> wrote:
>>>
>>>> On Fri, Jul 20, 2018 at 11:12 AM Jenkins >>> opendaylight.org> wrote:
>>>>
>>>>> Attention controller-devs,
>>>>>
>>>>> Autorelease oxygen failed to build sal-cluster-admin-impl from
>>>>> controller in build
>>>>> 359. Attached is a snippet of the error message related to the
>>>>> failure that we were able to automatically parse as well as console
>>>>> logs.
>>>>>
>>>>> Console Logs:
>>>>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
>>>>> autorelease-release-oxygen/359
>>>>>
>>>>> Jenkins Build:
>>>>> https://jenkins.opendaylight.org/releng/job/autorelease-
>>>>> release-oxygen/359/
>>>>>
>>>>> Please review and provide an ETA on when a fix will be available.
>>>>>
>>>>> Thanks,
>>>>> ODL releng/autorelease team
>>>>>
>>>>  Hello controller-dev:
>>>>
>>>> Please look into these failed tests.
>>>>
>>>> Failed tests:
>>>>   ClusterAdminRpcServiceTest.testFlipMemberVotingStates:976->lambda$
>>>> testFlipMemberVotingStates$8:978 Expected leader member-1. Actual:
>>>> member-1-shard-cars-oper_testFlipMemberVotingStates
>>>>
>>>> Tests run: 17, Failures: 1, Errors: 0, Skipped: 0
>>>>
>>>
>>>
>>> I ran it successfully 500 times locally. But looking at the code and the
>>> test output from jenkins, I can see why it failed - just the right
>>> timing sequence coupled with just enough of a random thread execution delay
>>> and a deadline timeout set by the test being just a tad too low for that
>>> delay.  I'll push a patch. Another case where occasionally it seems there's
>>> just enough of a slight delay or slowdown in the jenkins environment to
>>> throw off timing to cause a test failure.
>>>
>>
>> Hi Tom,
>>
>> I'm curious when you said you ran it successfully 500 times locally did
>> you perform a full build during that time or tested the single test case in
>> isolation?
>>
>> I found that while troubleshooting the bgpcep issue in the bgp-bmp-mock
>> thread [0] that I had to run a full bgpcep build in order to reproduce the
>> issue on my own laptop system. I have a script that I'm testing now and
>> making it more generic that I will share to this list later which will
>> allow us to continuously run builds whether it's autorelease or project
>> specifc over and over infinitely and capture the maven output + surefire
>> logs output which I hope will help folks reproduce intermittent issues
>> locally.
>>
>> I feel like blaming infrastructure being "slow" is too easy an excuse for
>> issues. If the software was run in a customer production environment I
>> suspect telling the customer that their hardware is too slow and is not the
>> same hardware as the developer's laptop it would not be a solution the
>> customer would be happy with.
>>
>
> +1000
>
> Our code and tests need to be robust enough to handle diverse
> infrastructure. Bugs like this might be highlighted by infra variability,
> but they are still bugs in code/tests.
>
> Not picking on TomP or Controller here, this is a general ODL culture
> problem of blaming the infra first and until Thanh/Jamo/et al prove
> otherwise.
>

I did run the test in isolation.

I'm really not trying to blame the  infrastructure.  In this case, it looks
like the test set up a deadline that was a bit too short for comfort. The
vast majority of times it succeeds. I've seen this before with tests - it
fails on jenkins then I run it locally in a loop - sometimes it will fail
after 20 or 50 times and sometimes it takes hundreds of runs for the stars
to align right and fail. In this case I didn't see it fail locally after
500 - maybe it would've after 1000. Certainly running a full build or all
the tests is different than just running one test class. I think arbitrary
delays can occur due to GC with multiple test classes run in the same JVM.
I was just noting that in this failure it looks like there was enough of a
slight delay for whatever reason in the jenkins run that threw things off
and that could very well have been

Re: [controller-dev] [release] Autorelease oxygen failed to build sal-cluster-admin-impl from controller

2018-07-20 Thread Tom Pantelis
On Fri, Jul 20, 2018 at 4:48 AM, Anil Belur 
wrote:

>
>
> On Fri, Jul 20, 2018 at 11:12 AM Jenkins  opendaylight.org> wrote:
>
>> Attention controller-devs,
>>
>> Autorelease oxygen failed to build sal-cluster-admin-impl from controller
>> in build
>> 359. Attached is a snippet of the error message related to the
>> failure that we were able to automatically parse as well as console logs.
>>
>>
>> Console Logs:
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
>> autorelease-release-oxygen/359
>>
>> Jenkins Build:
>> https://jenkins.opendaylight.org/releng/job/autorelease-
>> release-oxygen/359/
>>
>> Please review and provide an ETA on when a fix will be available.
>>
>> Thanks,
>> ODL releng/autorelease team
>>
>>
>  Hello controller-dev:
>
> Please look into these failed tests.
>
> Failed tests:
>   ClusterAdminRpcServiceTest.testFlipMemberVotingStates:976->lambda$
> testFlipMemberVotingStates$8:978 Expected leader member-1. Actual:
> member-1-shard-cars-oper_testFlipMemberVotingStates
>
> Tests run: 17, Failures: 1, Errors: 0, Skipped: 0
>


I ran it successfully 500 times locally. But looking at the code and the
test output from jenkins, I can see why it failed - just the right timing
sequence coupled with just enough of a random thread execution delay and a
deadline timeout set by the test being just a tad too low for that delay.
I'll push a patch. Another case where occasionally it seems there's just
enough of a slight delay or slowdown in the jenkins environment to throw
off timing to cause a test failure.


>
> Thanks,
> Anil
>
> ___
> release mailing list
> rele...@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/release
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL abrupt restart - System.exit() via QuarantinedMonitorActorPropsFactory ?

2018-07-05 Thread Tom Pantelis
On Thu, Jul 5, 2018 at 1:42 PM, Michael Vorburger 
wrote:

> On Thu, Jul 5, 2018 at 7:39 PM, Tom Pantelis 
> wrote:
>
>> On Thu, Jul 5, 2018 at 1:35 PM, Michael Vorburger 
>> wrote:
>>
>>> Tom, or Robert, or anyone else having hit this themselves,
>>>
>>> would you be able to remind us what in clustering can cause an ODL
>>> abrupt restart - System.exit() via bundleContext.getBundle(0).stop();
>>> from https://github.com/opendaylight/controller/blob/master/opend
>>> aylight/md-sal/sal-distributed-datastore/src/main/java/org/
>>> opendaylight/controller/cluster/akka/osgi/impl/Quarant
>>> inedMonitorActorPropsFactory.java ?
>>>
>>> I do vaguely an "inconsistent cluster" leading to this - clarify exactly
>>> what situation leads to that? Loss of leader? Loss of majority?
>>>
>>> asking for https://bugzilla.redhat.com/show_bug.cgi?id=1597304 ...
>>>
>>
>> That happens when akka quarantines a node - it can no longer rejoin the
>> majority cluster unless the actor system is restarted, hence we restart the
>> whole JVM.
>>
>
> and what can cause Akka to have to quarantine a node?
>


An unrecoverable failure state - see
https://livingston.io/understanding-akkas-quarantine-state/ for more
detail.
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL abrupt restart - System.exit() via QuarantinedMonitorActorPropsFactory ?

2018-07-05 Thread Tom Pantelis
On Thu, Jul 5, 2018 at 1:35 PM, Michael Vorburger 
wrote:

> Tom, or Robert, or anyone else having hit this themselves,
>
> would you be able to remind us what in clustering can cause an ODL abrupt
> restart - System.exit() via bundleContext.getBundle(0).stop(); from
> https://github.com/opendaylight/controller/blob/
> master/opendaylight/md-sal/sal-distributed-datastore/src/
> main/java/org/opendaylight/controller/cluster/akka/osgi/impl/
> QuarantinedMonitorActorPropsFactory.java ?
>
> I do vaguely an "inconsistent cluster" leading to this - clarify exactly
> what situation leads to that? Loss of leader? Loss of majority?
>
> asking for https://bugzilla.redhat.com/show_bug.cgi?id=1597304 ...
>

That happens when akka quarantines a node - it can no longer rejoin the
majority cluster unless the actor system is restarted, hence we restart the
whole JVM.


>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] (no subject)

2018-07-02 Thread Tom Pantelis
>
>
> Tom,
> After re-reading my last mail, I see that the data directory is
> cleaned/removed, then, the last step, is to copy what was stashed away back
> to the newly created data dir. From what I see, this is only copying the
> logs from the previous karaf instance back to the newly created dir. So,
> this seems ok, agree?
>
> Here are more details on step #4 above, where the karaf logs are copied to
> /tmp:
>
> mkdir -p '/tmp' && rm -vrf '/tmp/log' && mv -vf 
> '/tmp/karaf-0.8.3-SNAPSHOT/data/log'
> '/tmp/'
>

Copying the logs is OK. It looks a little odd to see 2 message sequences
for " Lock acquired. Setting startlevel to 100" ... "All initial bundles
installed and set to start" 7 min apart. I wonder if it's possible the
first set were from the previous run and somehow the log got truncated.
Another thing to try is to not copy the logs back so it starts a new log.
Or copy them with a different extension, eg append the timestamp
"karaf.log.123456789", so they're at least retained. This would also make
it easier when scanning the logs  - otherwise, with them appended, you then
have to figure out if a log message came from the 1st run, 2nd etc.



>
>
>
>
>>
>>1.
>>
>>
>>> Other than that, we probably need to get a thread dump.
>>>
 Thanks,

 Vic




 Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main launch
 INFO: Installing and starting initial bundles
 Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main launch
 INFO: All initial bundles installed and set to start
 Jun 29, 2018 3:43:47 PM org.apache.karaf.main.lock.SimpleFileLock lock
 INFO: Trying to lock /tmp/karaf-0.8.3-SNAPSHOT/lock
 Jun 29, 2018 3:43:47 PM org.apache.karaf.main.lock.SimpleFileLock lock
 INFO: Lock acquiredJun 29, 2018 3:43:47 PM 
 org.apache.karaf.main.Main$KarafLockCallback lockAquired
 INFO: Lock acquired. Setting startlevel to 100
 Jun 29, 2018 3:50:48 PM org.apache.karaf.main.Main launch
 INFO: Installing and starting initial bundles
 Jun 29, 2018 3:50:49 PM org.apache.karaf.main.Main launch
 INFO: All initial bundles installed and set to start
 Jun 29, 2018 3:50:49 PM org.apache.karaf.main.lock.SimpleFileLock lock
 INFO: Trying to lock /tmp/karaf-0.8.3-SNAPSHOT/lock
 Jun 29, 2018 3:50:49 PM org.apache.karaf.main.lock.SimpleFileLock lock
 INFO: Lock acquired
 Jun 29, 2018 3:50:49 PM org.apache.karaf.main.Main$KarafLockCallback 
 lockAquired
 INFO: Lock acquired. Setting startlevel to 100



 ___
 controller-dev mailing list
 controller-dev@lists.opendaylight.org
 https://lists.opendaylight.org/mailman/listinfo/controller-dev


>>>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] (no subject)

2018-07-02 Thread Tom Pantelis
On Mon, Jul 2, 2018 at 2:15 PM, Victor Pickard  wrote:

> Hi all,
>
> I'm looking at clustering stability. One of the jobs I've been looking at is 
> controller clustering. This is a good CSIT, in that it stops and starts ODL 
> several times during the run.
>
> In one of failed test runs (sandbox, logs wiped from last week, but I do have 
> this particular karaf log archived locally), ODL is started, and rest calls 
> fail during the test. Looking at the logs, I can see why. Karaf failed to 
> start, or better yet, took a really long time to start. From the snipped 
> below, you can see about 7 mins between when Karaf launched, and did 
> something?, maybe restarted again. But the main thing is that karaf failed to 
> start in a timely manner, taking over 7 minutes to begin to start up 
> blueprints, etc.
>
>
> I ran a job that had karaf debug logging enabled with this setting:
>
> log4j.rootLogger=DEBUG
>
>
> This did not go very well. This generates way too much debug info, and was 
> causing timeouts and other various errors in the CSIT run.
>
>
> So, my questions are:
>
> 1. Has anyone see this issue where karaf seems to hang on startup (after a 
> kill -9 on karaf pid)? If so, is this a known issue?
>
> 2. What debug would be needed to figure out why karaf was hanging? Note the 
> above generated a log file of ~768 MB in a very short timespan.
>
>
> Vic - does this happen if you gracefully shut it down? In years past with
karaf I recall corruption could occur in the bundle cache under data if the
karaf process was killed. I don't know if that potential issue is still
present with karaf 4. Does it clean the data dir before restarting? If not,
it would be good to do so to be safe.

Other than that, we probably need to get a thread dump.

> Thanks,
>
> Vic
>
>
>
>
> Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main launch
> INFO: Installing and starting initial bundles
> Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main launch
> INFO: All initial bundles installed and set to start
> Jun 29, 2018 3:43:47 PM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Trying to lock /tmp/karaf-0.8.3-SNAPSHOT/lock
> Jun 29, 2018 3:43:47 PM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Lock acquiredJun 29, 2018 3:43:47 PM 
> org.apache.karaf.main.Main$KarafLockCallback lockAquired
> INFO: Lock acquired. Setting startlevel to 100
> Jun 29, 2018 3:50:48 PM org.apache.karaf.main.Main launch
> INFO: Installing and starting initial bundles
> Jun 29, 2018 3:50:49 PM org.apache.karaf.main.Main launch
> INFO: All initial bundles installed and set to start
> Jun 29, 2018 3:50:49 PM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Trying to lock /tmp/karaf-0.8.3-SNAPSHOT/lock
> Jun 29, 2018 3:50:49 PM org.apache.karaf.main.lock.SimpleFileLock lock
> INFO: Lock acquired
> Jun 29, 2018 3:50:49 PM org.apache.karaf.main.Main$KarafLockCallback 
> lockAquired
> INFO: Lock acquired. Setting startlevel to 100
>
>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [integration-dev] cluster failures in controller cars/people job.

2018-06-20 Thread Tom Pantelis
On Wed, Jun 20, 2018 at 5:09 AM, Ariel Adam  wrote:

> Tom, can you provide more information on the implications to the user of
> such a problem?
> What is the "tell-based" and is this something we are fixing in Oxygen
> SR's as well?
>

There's been a lot of discussion comments on the JIRA. Please read thru the
comments - hopefully that will answer all your questions.


>
> Thanks.
>
> On Tue, Jun 19, 2018 at 10:18 PM Tom Pantelis 
> wrote:
>
>>
>>
>> On Tue, Jun 19, 2018 at 2:54 PM, Jamo Luhrsen  wrote:
>>
>>> All, we have a newfound drive to hopefully get our clustering
>>> story a little more stable. One part of that is to start at the
>>> bottom and clean our 3node CSIT jobs up and work up. So, I
>>> started with the cars/people job in the controller project.
>>>
>>>
>>> Tom P, I took the liberty to assign you to this new JIRA I just
>>> created:
>>>
>>>   https://jira.opendaylight.org/browse/CONTROLLER-1838
>>
>>
>> I noted it. What you saw is known and unfortunately expected behavior
>> with the ask-based protocol which tell-based promises to alleviate.
>>
>>
>>>
>>>
>>> please feel free to un-assign yourself or move it to someone else.
>>>
>>> I'm ready to do whatever I can with the skills that I have to
>>> help figure this one out.
>>>
>>>
>>> Thanks,
>>> JamO
>>>
>>> ___
>>> controller-dev mailing list
>>> controller-dev@lists.opendaylight.org
>>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>>
>>
>> ___
>> integration-dev mailing list
>> integration-...@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/integration-dev
>>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] cluster failures in controller cars/people job.

2018-06-19 Thread Tom Pantelis
On Tue, Jun 19, 2018 at 2:54 PM, Jamo Luhrsen  wrote:

> All, we have a newfound drive to hopefully get our clustering
> story a little more stable. One part of that is to start at the
> bottom and clean our 3node CSIT jobs up and work up. So, I
> started with the cars/people job in the controller project.
>
>
> Tom P, I took the liberty to assign you to this new JIRA I just
> created:
>
>   https://jira.opendaylight.org/browse/CONTROLLER-1838


I noted it. What you saw is known and unfortunately expected behavior with
the ask-based protocol which tell-based promises to alleviate.


>
>
> please feel free to un-assign yourself or move it to someone else.
>
> I'm ready to do whatever I can with the skills that I have to
> help figure this one out.
>
>
> Thanks,
> JamO
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] Autorelease fluorine failed to build sample-toaster-it from controller

2018-06-19 Thread Tom Pantelis
On Tue, Jun 19, 2018 at 1:44 AM, Ariel Adam  wrote:

> Fluorine looks good now.
> Can you also push it into the Oxygen?
>

Already done - https://git.opendaylight.org/gerrit/#/c/73118/


>
> Thanks.
>
> On Sun, Jun 17, 2018 at 6:15 PM Tom Pantelis 
> wrote:
>
>>
>> On Sun, Jun 17, 2018 at 8:47 AM, Tom Pantelis 
>> wrote:
>>
>>> This appears to be a pax exam issue:
>>>
>>> 2018-06-17T01:54:27,557 | WARN  | pool-2-thread-1  | AetherBasedResolver
>>>   | 2 - org.ops4j.pax.url.mvn - 2.5.4 | Error resolving artifact 
>>> org.ops4j.pax.tipi:org.ops4j.pax.tipi.junit:jar:4.12.0.1: [Could not find 
>>> artifact org.ops4j.pax.tipi:org.ops4j.pax.tipi.junit:jar:4.12.0.1 in 
>>> defaultlocal (file:/tmp/r/), Could not find artifact 
>>> org.ops4j.pax.tipi:org.ops4j.pax.tipi.junit:jar:4.12.0.1 in 
>>> system.repository 
>>> (file:/w/workspace/autorelease-release-oxygen/controller/opendaylight/md-sal/samples/toaster-it/target/exam/f302a2c9-2cc5-40dc-8c2b-02b85c0fe216/system/)]
>>>
>>> No idea why this just started popping up.  It's only happening on
>>> auto-release - of course it doesn't happen locally and I haven't seen
>>> it on verify/merge jibs. Has something changed recently with the 
>>> auto-release
>>> job?
>>>
>>>
>> I was able to repro locally by removing the org.ops4j.pax.tipi.hamcrest.junit
>> and org.ops4j.pax.tipi.hamcrest.core artifacts from my local .m2 repo.
>> Adding the dependencies to the mdsal-it-base pom causes them to be
>> downloaded and fixes the issue. I submitted https://git.
>> opendaylight.org/gerrit/#/c/73069/ - after merge, can someone please
>> re-run AR? If that's good then I'll cherry-pick to oxygen.
>>
>> it's a mystery why this issue all of a sudden popped up and why we now
>> have to explicitly reference those dependencies - and why this isn't
>> happening on verify jobs. Maybe someone knows a recent change somewhere
>> that could correlate ...
>>
>>
>>>
>>> On Sun, Jun 17, 2018 at 4:02 AM, Ariel Adam  wrote:
>>>
>>>> Controller project, finally it seems we have a real failure in the
>>>> autorelease.
>>>> The past 3 builds for Fluorine and Oxygen have failed with the same
>>>> error:
>>>>
>>>> *Oxygen*
>>>>
>>>> [ERROR] Failed to execute goal 
>>>> org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) 
>>>> on project sample-toaster-it: There are test failures.*01:57:25* [ERROR] 
>>>> *01:57:25* [ERROR] Please refer to 
>>>> /w/workspace/autorelease-release-oxygen/controller/opendaylight/md-sal/samples/toaster-it/target/surefire-reports
>>>>  for the individual test results.
>>>>
>>>>
>>>> *Fluorine*
>>>>
>>>> *02:04:26* 
>>>> testToaster(org.opendaylight.controller.sample.toaster.it.ToasterTest)  
>>>> Time elapsed: 182.369 sec  <<< ERROR!*02:04:26* 
>>>> java.rmi.NotBoundException: 97e69590-abd8-408d-8506-1b505046f5a7*02:04:26* 
>>>>  at 
>>>> sun.rmi.registry.RegistryImpl.lookup(RegistryImpl.java:227)*02:04:26*  
>>>>   at 
>>>> sun.rmi.registry.RegistryImpl_Skel.dispatch(RegistryImpl_Skel.java:115)*02:04:26*
>>>> at 
>>>> sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:468)*02:04:26*
>>>>  at 
>>>> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:300)*02:04:26*
>>>> at sun.rmi.transport.Transport$1.run(Transport.java:200)*02:04:26* 
>>>>  at sun.rmi.transport.Transport$1.run(Transport.java:197)*02:04:26*
>>>>   at java.security.AccessController.doPrivileged(Native Method)*02:04:26*  
>>>>at 
>>>> sun.rmi.transport.Transport.serviceCall(Transport.java:196)*02:04:26*  
>>>>   at 
>>>> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)*02:04:26*
>>>>at 
>>>> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:835)*02:04:26*
>>>>at
>>>>
>>>>
>>>> Appreciate your assistance in solving it.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Sun, Jun 17, 2018 at 5:05 AM, Jenkins >>> opendaylight.org> wrote:
>>>>
>>>>> Attention controller-devs,
>>>>>
>>>>> Autorelease fluorine failed to build sample-t

Re: [controller-dev] [release] Autorelease fluorine failed to build sample-toaster-it from controller

2018-06-17 Thread Tom Pantelis
This appears to be a pax exam issue:

2018-06-17T01:54:27,557 | WARN  | pool-2-thread-1  |
AetherBasedResolver  | 2 - org.ops4j.pax.url.mvn - 2.5.4 |
Error resolving artifact
org.ops4j.pax.tipi:org.ops4j.pax.tipi.junit:jar:4.12.0.1: [Could not
find artifact org.ops4j.pax.tipi:org.ops4j.pax.tipi.junit:jar:4.12.0.1
in defaultlocal (file:/tmp/r/), Could not find artifact
org.ops4j.pax.tipi:org.ops4j.pax.tipi.junit:jar:4.12.0.1 in
system.repository
(file:/w/workspace/autorelease-release-oxygen/controller/opendaylight/md-sal/samples/toaster-it/target/exam/f302a2c9-2cc5-40dc-8c2b-02b85c0fe216/system/)]

No idea why this just started popping up.  It's only happening on
auto-release - of course it doesn't happen locally and I haven't seen it on
verify/merge jibs. Has something changed recently with the auto-release job?


On Sun, Jun 17, 2018 at 4:02 AM, Ariel Adam  wrote:

> Controller project, finally it seems we have a real failure in the
> autorelease.
> The past 3 builds for Fluorine and Oxygen have failed with the same error:
>
> *Oxygen*
>
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on 
> project sample-toaster-it: There are test failures.*01:57:25* [ERROR] 
> *01:57:25* [ERROR] Please refer to 
> /w/workspace/autorelease-release-oxygen/controller/opendaylight/md-sal/samples/toaster-it/target/surefire-reports
>  for the individual test results.
>
>
> *Fluorine*
>
> *02:04:26* 
> testToaster(org.opendaylight.controller.sample.toaster.it.ToasterTest)  Time 
> elapsed: 182.369 sec  <<< ERROR!*02:04:26* java.rmi.NotBoundException: 
> 97e69590-abd8-408d-8506-1b505046f5a7*02:04:26* at 
> sun.rmi.registry.RegistryImpl.lookup(RegistryImpl.java:227)*02:04:26*
> at 
> sun.rmi.registry.RegistryImpl_Skel.dispatch(RegistryImpl_Skel.java:115)*02:04:26*
> at 
> sun.rmi.server.UnicastServerRef.oldDispatch(UnicastServerRef.java:468)*02:04:26*
>  at 
> sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:300)*02:04:26* 
>at sun.rmi.transport.Transport$1.run(Transport.java:200)*02:04:26* 
>  at sun.rmi.transport.Transport$1.run(Transport.java:197)*02:04:26*  at 
> java.security.AccessController.doPrivileged(Native Method)*02:04:26* 
> at sun.rmi.transport.Transport.serviceCall(Transport.java:196)*02:04:26*  
>   at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573)*02:04:26*
>at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:835)*02:04:26*
>at
>
>
> Appreciate your assistance in solving it.
>
> Thanks.
>
>
> On Sun, Jun 17, 2018 at 5:05 AM, Jenkins  opendaylight.org> wrote:
>
>> Attention controller-devs,
>>
>> Autorelease fluorine failed to build sample-toaster-it from controller in
>> build
>> 118. Attached is a snippet of the error message related to the
>> failure that we were able to automatically parse as well as console logs.
>>
>>
>> Console Logs:
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/a
>> utorelease-release-fluorine/118
>>
>> Jenkins Build:
>> https://jenkins.opendaylight.org/releng/job/autorelease-rele
>> ase-fluorine/118/
>>
>> Please review and provide an ETA on when a fix will be available.
>>
>> Thanks,
>> ODL releng/autorelease team
>>
>>
>> ___
>> release mailing list
>> rele...@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/release
>>
>>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Controller High CPU

2018-06-12 Thread Tom Pantelis
On Tue, Jun 12, 2018 at 10:01 PM, Luis Gomez  wrote:

> OK, then we just wait for the fix.
>


There's more work to do to before removing the all CSS code completely but
I've been tied up with other stuff. I don't know how
https://git.opendaylight.org/gerrit/#/c/72674 could've caused the
FeatureConfigPusher to use high CPU - that patch just removed all the CSS
yang modules from the controller.  If it's an immediate issue then I can
push a quick patch to disable the FeatureConfigPusher.


>
> On Jun 12, 2018, at 6:54 PM, Tom Pantelis  wrote:
>
>
>
> On Tue, Jun 12, 2018 at 9:47 PM, Luis Gomez  wrote:
>
>> Hi all,
>>
>> Today I noticed unusual high CPU in controller, so I installed simple
>> feature "odl-restconf" and attached a profiler I saw below class uses lot
>> of CPU (see attached):
>>
>> ChildAwareFeatureWrapper.java:81 org.opendaylight.controller.co
>> nfigpusherfeature.internal.ChildAwareFeatureWrapper.getFeatu
>> reConfigSnapshotHolders()
>>
>
>
> That's part of the CSS which will be removed shortly.
>
>
>>
>> Because of the above and since the issue showed up today, I think this
>> patch could be the culprit:
>>
>> https://git.opendaylight.org/gerrit/#/c/72674
>>
>
>
>>
>> Just let me know if I should open a bug or what is next steps because
>> this must be impacting all projects right now.
>>
>> BR/Luis
>>
>>
>> 
>>
>> ___
>> controller-dev mailing list
>> controller-dev@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>
>>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] OK to resurrect c/64522 to first move infrautils.DiagStatus integration for datastore from genius to controller, and then improve it for GENIUS-138 ?

2018-06-08 Thread Tom Pantelis
On Fri, Jun 8, 2018 at 5:02 PM, Anil Vishnoi  wrote:

>
>
> On Fri, Jun 8, 2018 at 1:49 PM, Tom Pantelis 
> wrote:
>
>>
>>
>> On Fri, Jun 8, 2018 at 3:10 PM, Anil Vishnoi 
>> wrote:
>>
>>>
>>>
>>> On Thu, Jun 7, 2018 at 11:37 AM, Tom Pantelis 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Jun 7, 2018 at 1:14 PM, Michael Vorburger >>> > wrote:
>>>>
>>>>> Robert,
>>>>>
>>>>> just to avoid any misunderstandings and unnecessary extra work to
>>>>> throw away, may we double check and confirm that we correctly understand
>>>>> your comment in  https://jira.opendaylight.org/browse/GENIUS-138 to
>>>>> mean that we are past the "dependency of a mature project on an incubation
>>>>> project" objection and you are now OK with that we resurrect
>>>>> https://git.opendaylight.org/gerrit/#/c/64522/, to first move
>>>>> infrautils.DiagStatus integration for datastore from genius to controller?
>>>>> We would then improve it, in controller instead of genius, for the
>>>>> improvement proposed in issue GENIUS-138.
>>>>>
>>>>> Tom, OK for you as well to have such a dependency from controller to
>>>>> infrautils?
>>>>>
>>>>
>>>> I don't have a problem with it.
>>>>
>>>> BTW - I'm planning to add yang notifications to CDS to emit interesting
>>>> state/status changes, eg akka member sate changes (Up, Down, Unreachable
>>>> etc), shard leader/role changes 
>>>>
>>> ​Tom, is there any jira ticket that we can get some details about it ?
>>> Are these yang notification going to be local or routed ?​
>>>
>>>
>>
>> All yang notifications are local - not sure what you mean by routed.
>>
> ​I mean, like we route RPC's, i was wondering if you are building
> something that will route the yang notification as well to other node.​
>
>


>
>> My intention for these yang notifications is for  telemetry,  alarming...
>> ​​
>>
> ​Okay, that make sense. I was looking for a scenario where, in 3-node
> cluster, shard leader moves from controller-1 to controller-3, will
> controller-2 know about that ?As of now not sure about the usecase if that
> is requires or not, that's why more interested in details to see what;s
> coming :) )
>

yes controller-2  would already know about that shard leader move and could
emit an informational yang notification.



>
>>
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Tx,
>>>>> M.
>>>>> --
>>>>> Michael Vorburger, Red Hat
>>>>> vorbur...@redhat.com | IRC: vorburger @freenode | ~ =
>>>>> http://vorburger.ch
>>>>>
>>>>> ___
>>>>> controller-dev mailing list
>>>>> controller-dev@lists.opendaylight.org
>>>>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>>>>
>>>>>
>>>>
>>>> ___
>>>> controller-dev mailing list
>>>> controller-dev@lists.opendaylight.org
>>>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks
>>> Anil
>>>
>>
>>
>
>
> --
> Thanks
> Anil
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Toaster-consumer in odl-startup-archetype directory structure

2018-06-08 Thread Tom Pantelis
Pranjal,

The archetype generates initial suggested directories. There's really no
rules on how to organize the code - in the end it's your code and up to
you. The toaster example has the RPC implementation in a provider bundle
and an RPC user in a consumer bundle but it's just an example. If you have
RPCs, the user/consumer may just be the NB restconf.

Tom

On Fri, Jun 8, 2018 at 12:52 AM, Pranjal Sharma  wrote:

> Hi All,
>
> I hope you are doing well. I am fairly new to the OpenDaylight framework
> and trying to get a finer understanding of the MD-SAL infrastructure. To
> get started, I am trying to develop the complete *Toaster* example from
> scratch using *odl-startup-archetype*. I am following the tutorial:
> https://wiki.opendaylight.org/view/OpenDaylight_Controller:
> MD-SAL:Toaster_Step-By-Step.
>
> Before starting the Toaster example, I have developed a simple MD-SAL
> application that had a provider but no in-project consumer. So, I used to
> put all the yangs in *api* directory, and the providers in the *impl*
> directory generated by the odl-startup archetype.
>
> With Toaster, I am struggling to understand where to place the consumer in
> the directory structure generated by the odl-startup-archetype. Does the
> Consumer go into the *impl* directory too (probably in a sub-directory
> for consumer), as it is exposing some functionality itself? I have seen the
> code of Toaster on GitHub, but that didn't clarify my doubt.
>
> I will really appreciate any help regarding this. I look forward to your
> response in the matter.
>
>
> Thanks,
> Pranjal Sharma
>
>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] OK to resurrect c/64522 to first move infrautils.DiagStatus integration for datastore from genius to controller, and then improve it for GENIUS-138 ?

2018-06-08 Thread Tom Pantelis
On Fri, Jun 8, 2018 at 9:58 AM, Michael Vorburger 
wrote:

> Tom,
>
> On Thu, Jun 7, 2018 at 8:37 PM, Tom Pantelis 
> wrote:
>
>> On Thu, Jun 7, 2018 at 1:14 PM, Michael Vorburger 
>> wrote:
>>
>>> Robert,
>>>
>>> just to avoid any misunderstandings and unnecessary extra work to throw
>>> away, may we double check and confirm that we correctly understand your
>>> comment in  https://jira.opendaylight.org/browse/GENIUS-138 to mean
>>> that we are past the "dependency of a mature project on an incubation
>>> project" objection and you are now OK with that we resurrect
>>> https://git.opendaylight.org/gerrit/#/c/64522/, to first move
>>> infrautils.DiagStatus integration for datastore from genius to controller?
>>> We would then improve it, in controller instead of genius, for the
>>> improvement proposed in issue GENIUS-138.
>>>
>>> Tom, OK for you as well to have such a dependency from controller to
>>> infrautils?
>>>
>>
>> I don't have a problem with it.
>>
>
> please forget I asked... ;-)
>
>
>> BTW - I'm planning to add yang notifications to CDS to emit interesting
>> state/status changes, eg akka member sate changes (Up, Down, Unreachable
>> etc), shard leader/role changes 
>>
>
> that sounds interesting, do you have / would you like to create a JIRA we
> can watch re. this?
>

I haven't yet.


>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
>
> Tx,
>>> M.
>>> --
>>> Michael Vorburger, Red Hat
>>> vorbur...@redhat.com | IRC: vorburger @freenode | ~ =
>>> http://vorburger.ch
>>>
>>> ___
>>> controller-dev mailing list
>>> controller-dev@lists.opendaylight.org
>>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>>
>>>
>>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] Intermittent "Unable to create cache directory..." failures

2018-06-07 Thread Tom Pantelis
Hello,

We've been seeing these intermittent netconf failures on jenkins:

 Caused by: java.lang.IllegalArgumentException: Unable to create cache
directory at cache/schema
 at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:210)
 at
org.opendaylight.yangtools.yang.model.repo.util.FilesystemSchemaSourceCache.(FilesystemSchemaSourceCache.java:74)
 at
org.opendaylight.netconf.topology.AbstractNetconfTopology.(AbstractNetconfTopology.java:149)

In FilesystemSchemaSourceCache, it does this:

  if (!storageDirectory.exists()) {
  checkArgument(storageDirectory.mkdirs(), "Unable to create cache
directory at %s",
  storageDirectory);
  }

mkdirs returns false if the dir/file already exists. I think there's a race
condition where some other code on another thread interleaves and creates
the dir in between the exists and mkdirs calls. As a workaround, I
submitted https://git.opendaylight.org/gerrit/#/c/72775/ that adds retries
if FilesystemSchemaSourceCache throws an IAE and also no longer fails class
initialization on failure. So even if there's another strange reason mkdirs
returns false (eg permissions although highly unlikely), at least it won't
fail SFT.

Assuming my theory is correct,  changing FilesystemSchemaSourceCache to:

  checkArgument(storageDirectory.mkdirs() || storageDirectory.exists(),
"Unable to create cache directory at %s", storageDirectory);

would alleviate the issue. I'll push that change to yangtools as well
although it won't be available until the next release.

Tom
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] yang notifications to CDS to emit interesting state/status changes

2018-06-07 Thread Tom Pantelis
On Thu, Jun 7, 2018 at 3:38 PM, Faseela K  wrote:

>
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Friday, June 08, 2018 1:05 AM
> *To:* Faseela K 
> *Cc:* Michael Vorburger ; controller-dev <
> controller-dev@lists.opendaylight.org>; genius-...@lists.opendaylight.org;
> Robert Varga 
> *Subject:* Re: yang notifications to CDS to emit interesting state/status
> changes
>
>
>
>
>
>
>
> On Thu, Jun 7, 2018 at 3:25 PM, Tom Pantelis 
> wrote:
>
>
>
>
>
> On Thu, Jun 7, 2018 at 3:20 PM, Faseela K  wrote:
>
> Yes, we are currently using EOS in such scenarios. But is there a way to
> specify that I want my entity owner to be on “default-config shard leader”?
>
>
>
> No  - EOS and shards are different concepts. But you mentioned there's
> DTCN's from different shards involved, say default-config and
> default-operational, so that forcing owner to the default-config shard
> leader wouldn't help you - access to the default-operational shard would
> still be remote.
>
>
>
> >> I do have some cases where as a result of the 2 events, whatever I
> have to read/write are from one shard only. In such cases, it helps if
> there is a way to force the processing on a specific node of my choice.
>
>
>
> That’s why I was asking whether the new notifications you are going to add
> will help in implementing such functionalities.
>
>
>
> They're not related.
>
>
>
> Actually the notifications theoretically could be used for some service
> placement component as Robert mentioned.
>
>   >> Yes, I was planning to shoot an email to controller-dev about the
> same thing whatever you are trying to implement. I can even have a cache in
> my module, which stores the shard leader information, and use clustered
> DTCNs and process the event only on the node which is the leader.
>


> The cache can be updated everytime a notification comes for
> shard leader change. Looks similar to EOS, but I am not sure how robust it
> will be if shards are moving, or in a jeopardy state.
>
>
>

That could get tricky - there may be issues with transitions and
notification delays/timing where possibly no one processes an event at the
time it's received. Also that seems like it would tie the code to a
specific shard configuration which may not be desirable.


Perhaps what you want is a DataBroker-like API that only performs the
operation if the shard leader is local. So all nodes get a DTCN and commit
a write operation but only one node actually executes it.
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] yang notifications to CDS to emit interesting state/status changes

2018-06-07 Thread Tom Pantelis
On Thu, Jun 7, 2018 at 3:20 PM, Faseela K  wrote:

> Yes, we are currently using EOS in such scenarios. But is there a way to
> specify that I want my entity owner to be on “default-config shard leader”?
>

No  - EOS and shards are different concepts. But you mentioned there's
DTCN's from different shards involved, say default-config and
default-operational, so that forcing owner to the default-config shard
leader wouldn't help you - access to the default-operational shard would
still be remote.


> That’s why I was asking whether the new notifications you are going to add
> will help in implementing such functionalities.
>

They're not related.


>
>
> Thanks,
>
> Faseela
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Friday, June 08, 2018 12:41 AM
> *To:* Faseela K 
> *Cc:* Michael Vorburger ; controller-dev <
> controller-dev@lists.opendaylight.org>; genius-...@lists.opendaylight.org;
> Robert Varga 
> *Subject:* Re: yang notifications to CDS to emit interesting state/status
> changes
>
>
>
>
>
>
>
> On Thu, Jun 7, 2018 at 3:05 PM, Faseela K  wrote:
>
> [Changed subject]
>
>
>
> No, that’s not the point.
>
> I have event A that can be fired on nodeA, event B fired on node B, both
> are DTCNs but on different shard leaders.
>
> But I want both the events to be processed on same node.
>
>
>
> That's where EOS/cluster singleton come into play. With cluster singleton
> you spin up the DTCLs only when the node gets ownership.
>
>
>
>
>
> Thanks,
>
> Faseela
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Friday, June 08, 2018 12:32 AM
> *To:* Faseela K 
> *Cc:* Michael Vorburger ; infrautils-dev@lists.
> opendaylight.org; controller-dev ;
> genius-...@lists.opendaylight.org; Robert Varga 
> *Subject:* Re: [infrautils-dev] [controller-dev] OK to resurrect c/64522
> to first move infrautils.DiagStatus integration for datastore from genius
> to controller, and then improve it for GENIUS-138 ?
>
>
>
>
>
>
>
> On Thu, Jun 7, 2018 at 2:49 PM, Faseela K  wrote:
>
> Tom,
>
>   Currently we have certain cases, where we use EOS to ensure that we
> process a set of northbound+southbound events on same node.
>
>   (I am not sure whether that is the actual purpose of EOS, but we use it
> like that as well. ;))
>
>   This has certain issues that in a 3 node cluster, your entity owner
> might be node2, but the datastores you are writing to as a result of the
> event has a leader on node1, and the writes will end up being slow. So if I
> have a mechanism to force default-operational shard DTCNs to be processed
> on the leader of default-config-shard(if my writes as a result of the
> notifications is going to be config shard writes), I would like to use
> that.(I am not sure whether I made it clear, we can discuss this in our
> next genius meeting as well. I can point you to some usages in genius.)
>
> Thanks,
>
> Faseela
>
>
>
> You can use a DataTreeChangeListener rather than a Clustered
> DataTreeChangeListener. The former is only notified on the shard leader
> and thus only one in the cluster.
>
>
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Friday, June 08, 2018 12:13 AM
> *To:* Faseela K 
> *Cc:* Michael Vorburger ; infrautils-dev@lists.
> opendaylight.org; controller-dev ;
> genius-...@lists.opendaylight.org; Robert Varga 
>
>
> *Subject:* Re: [infrautils-dev] [controller-dev] OK to resurrect c/64522
> to first move infrautils.DiagStatus integration for datastore from genius
> to controller, and then improve it for GENIUS-138 ?
>
>
>
>
>
>
>
> On Thu, Jun 7, 2018 at 2:39 PM, Faseela K  wrote:
>
> Not related in this context, but if we can get shard leader change
> notification, can we use that to derive an entity owner instead of using
> EOS? ;)
>
>
>
> Not exactly sure what you mean but shards and EOS are 2 different
> concepts...
>
>
>
>
>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] yang notifications to CDS to emit interesting state/status changes

2018-06-07 Thread Tom Pantelis
On Thu, Jun 7, 2018 at 3:05 PM, Faseela K  wrote:

> [Changed subject]
>
>
>
> No, that’s not the point.
>
> I have event A that can be fired on nodeA, event B fired on node B, both
> are DTCNs but on different shard leaders.
>
> But I want both the events to be processed on same node.
>

That's where EOS/cluster singleton come into play. With cluster singleton
you spin up the DTCLs only when the node gets ownership.


>
>
> Thanks,
>
> Faseela
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Friday, June 08, 2018 12:32 AM
> *To:* Faseela K 
> *Cc:* Michael Vorburger ; infrautils-dev@lists.
> opendaylight.org; controller-dev ;
> genius-...@lists.opendaylight.org; Robert Varga 
> *Subject:* Re: [infrautils-dev] [controller-dev] OK to resurrect c/64522
> to first move infrautils.DiagStatus integration for datastore from genius
> to controller, and then improve it for GENIUS-138 ?
>
>
>
>
>
>
>
> On Thu, Jun 7, 2018 at 2:49 PM, Faseela K  wrote:
>
> Tom,
>
>   Currently we have certain cases, where we use EOS to ensure that we
> process a set of northbound+southbound events on same node.
>
>   (I am not sure whether that is the actual purpose of EOS, but we use it
> like that as well. ;))
>
>   This has certain issues that in a 3 node cluster, your entity owner
> might be node2, but the datastores you are writing to as a result of the
> event has a leader on node1, and the writes will end up being slow. So if I
> have a mechanism to force default-operational shard DTCNs to be processed
> on the leader of default-config-shard(if my writes as a result of the
> notifications is going to be config shard writes), I would like to use
> that.(I am not sure whether I made it clear, we can discuss this in our
> next genius meeting as well. I can point you to some usages in genius.)
>
> Thanks,
>
> Faseela
>
>
>
> You can use a DataTreeChangeListener rather than a Clustered
> DataTreeChangeListener. The former is only notified on the shard leader
> and thus only one in the cluster.
>
>
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Friday, June 08, 2018 12:13 AM
> *To:* Faseela K 
> *Cc:* Michael Vorburger ; infrautils-dev@lists.
> opendaylight.org; controller-dev ;
> genius-...@lists.opendaylight.org; Robert Varga 
>
>
> *Subject:* Re: [infrautils-dev] [controller-dev] OK to resurrect c/64522
> to first move infrautils.DiagStatus integration for datastore from genius
> to controller, and then improve it for GENIUS-138 ?
>
>
>
>
>
>
>
> On Thu, Jun 7, 2018 at 2:39 PM, Faseela K  wrote:
>
> Not related in this context, but if we can get shard leader change
> notification, can we use that to derive an entity owner instead of using
> EOS? ;)
>
>
>
> Not exactly sure what you mean but shards and EOS are 2 different
> concepts...
>
>
>
>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [infrautils-dev] OK to resurrect c/64522 to first move infrautils.DiagStatus integration for datastore from genius to controller, and then improve it for GENIUS-138 ?

2018-06-07 Thread Tom Pantelis
On Thu, Jun 7, 2018 at 2:49 PM, Faseela K  wrote:

> Tom,
>
>   Currently we have certain cases, where we use EOS to ensure that we
> process a set of northbound+southbound events on same node.
>
>   (I am not sure whether that is the actual purpose of EOS, but we use it
> like that as well. ;))
>
>   This has certain issues that in a 3 node cluster, your entity owner
> might be node2, but the datastores you are writing to as a result of the
> event has a leader on node1, and the writes will end up being slow. So if I
> have a mechanism to force default-operational shard DTCNs to be processed
> on the leader of default-config-shard(if my writes as a result of the
> notifications is going to be config shard writes), I would like to use
> that.(I am not sure whether I made it clear, we can discuss this in our
> next genius meeting as well. I can point you to some usages in genius.)
>
> Thanks,
>
> Faseela
>

You can use a DataTreeChangeListener rather than a
ClusteredDataTreeChangeListener.
The former is only notified on the shard leader and thus only one in the
cluster.


>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Friday, June 08, 2018 12:13 AM
> *To:* Faseela K 
> *Cc:* Michael Vorburger ; infrautils-dev@lists.
> opendaylight.org; controller-dev ;
> genius-...@lists.opendaylight.org; Robert Varga 
>
> *Subject:* Re: [infrautils-dev] [controller-dev] OK to resurrect c/64522
> to first move infrautils.DiagStatus integration for datastore from genius
> to controller, and then improve it for GENIUS-138 ?
>
>
>
>
>
>
>
> On Thu, Jun 7, 2018 at 2:39 PM, Faseela K  wrote:
>
> Not related in this context, but if we can get shard leader change
> notification, can we use that to derive an entity owner instead of using
> EOS? ;)
>
>
>
> Not exactly sure what you mean but shards and EOS are 2 different
> concepts...
>
>
>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [infrautils-dev] OK to resurrect c/64522 to first move infrautils.DiagStatus integration for datastore from genius to controller, and then improve it for GENIUS-138 ?

2018-06-07 Thread Tom Pantelis
On Thu, Jun 7, 2018 at 2:39 PM, Faseela K  wrote:

> Not related in this context, but if we can get shard leader change
> notification, can we use that to derive an entity owner instead of using
> EOS? ;)
>

Not exactly sure what you mean but shards and EOS are 2 different
concepts...


>
>
> Thanks,
>
> Faseela
>
>
>
> *From:* infrautils-dev-boun...@lists.opendaylight.org [mailto:
> infrautils-dev-boun...@lists.opendaylight.org] *On Behalf Of *Tom Pantelis
> *Sent:* Friday, June 08, 2018 12:07 AM
> *To:* Michael Vorburger 
> *Cc:* infrautils-...@lists.opendaylight.org; controller-dev <
> controller-dev@lists.opendaylight.org>; genius-...@lists.opendaylight.org;
> Robert Varga 
> *Subject:* Re: [infrautils-dev] [controller-dev] OK to resurrect c/64522
> to first move infrautils.DiagStatus integration for datastore from genius
> to controller, and then improve it for GENIUS-138 ?
>
>
>
>
>
>
>
> On Thu, Jun 7, 2018 at 1:14 PM, Michael Vorburger 
> wrote:
>
> Robert,
>
>
>
> just to avoid any misunderstandings and unnecessary extra work to throw
> away, may we double check and confirm that we correctly understand your
> comment in  https://jira.opendaylight.org/browse/GENIUS-138 to mean that
> we are past the "dependency of a mature project on an incubation project"
> objection and you are now OK with that we resurrect https://git.
> opendaylight.org/gerrit/#/c/64522/, to first move infrautils.DiagStatus
> integration for datastore from genius to controller? We would then improve
> it, in controller instead of genius, for the improvement proposed in issue
> GENIUS-138.
>
>
>
> Tom, OK for you as well to have such a dependency from controller to
> infrautils?
>
>
>
> I don't have a problem with it.
>
>
>
> BTW - I'm planning to add yang notifications to CDS to emit interesting
> state/status changes, eg akka member sate changes (Up, Down, Unreachable
> etc), shard leader/role changes 
>
>
>
>
>
>
>
>
> Tx,
>
> M.
>
> --
>
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] OK to resurrect c/64522 to first move infrautils.DiagStatus integration for datastore from genius to controller, and then improve it for GENIUS-138 ?

2018-06-07 Thread Tom Pantelis
On Thu, Jun 7, 2018 at 1:14 PM, Michael Vorburger 
wrote:

> Robert,
>
> just to avoid any misunderstandings and unnecessary extra work to throw
> away, may we double check and confirm that we correctly understand your
> comment in  https://jira.opendaylight.org/browse/GENIUS-138 to mean that
> we are past the "dependency of a mature project on an incubation project"
> objection and you are now OK with that we resurrect https://git.
> opendaylight.org/gerrit/#/c/64522/, to first move infrautils.DiagStatus
> integration for datastore from genius to controller? We would then improve
> it, in controller instead of genius, for the improvement proposed in issue
> GENIUS-138.
>
> Tom, OK for you as well to have such a dependency from controller to
> infrautils?
>

I don't have a problem with it.

BTW - I'm planning to add yang notifications to CDS to emit interesting
state/status changes, eg akka member sate changes (Up, Down, Unreachable
etc), shard leader/role changes 




>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Exception handling in JAX RS ?

2018-06-06 Thread Tom Pantelis
On Wed, Jun 6, 2018 at 12:36 PM, Michael Vorburger 
wrote:

> Tom, Stephen, anyone else who wants to chime in on this on controller-dev
> (not because this is particularly specific to controller-dev, just thought
> of others on this list; also somewhere in netconf/restconf there must be
> something similar, how is the same handled there?),
>
> Two Qs re. my WIP (!) in https://git.opendaylight.org/gerrit/#/c/72735/ :
>
> 1. does what I have there so far look about right to you? (More from a
> general "how to correctly do exception handling in Java" than neutron PoV
> ...)
>
> 2. with this, I have a 159 errors in all of those JAX RS @Path annotated
> classes with their @GET @Produces @StatusCodes kind of methods. What's the
> right thing to do in JAX RS re. the 
> ReadFailedException/OperationFailedException
> which (now) need to be handled? Is it:
>
> a) just add "throws OperationFailedException" to all those JAX RS
> methods? Will JAX RS catch and both slf4j log and return a HTTP 500 with
> exception in body response?
>
> b) catch and rethrow, 159 times, as a javax.ws.rs.WebApplicationException?
> Or subclass of it, similar to the 
> org.opendaylight.neutron.northbound.api.ResourceNotFoundException,
> creating a say org.opendaylight.neutron.northbound.api.
> DatastoreOperationFailedWebApplicationException kind of class?
>

Subclass WebApplicationException - DatastoreOperationFailedException. I
would have the CRUD APIs throw this instead of ReadFailedException etc.
Since it would be unchecked, you wouldn't have to change 159 call sites.


>
> c) catch and return Response.status(500) myself? 159 times?
>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [WEATHER] TSC-112 : OK to merge remove c/71801 DataChangeListener and friends?

2018-05-28 Thread Tom Pantelis
On Mon, May 28, 2018 at 7:14 AM, Michael Vorburger 
wrote:

> Hello,
>
> could anyone having any objection to the merging Tom's https://git.
> opendaylight.org/gerrit/#/c/71801/ please say so, and why, within 48
> hours?
>
> As noted in https://jira.opendaylight.org/browse/TSC-112, the multipatch
> jobs https://jenkins.opendaylight.org/releng/job/
> integration-multipatch-test-fluorine/65/ proves that this does not break
> any managed projects that are part of autorelease.
>
> It may, of course, break un-managed projects, or downstream in-house code,
> which would have to be adjusted accordingly.
>

> Without hearing back strong objections, I will merge c/71801 this
> Thursday into controller master.
>

All upstream projects (including un-managed) have been converted to
DataTreeChangeListener a while ago. Notice of deprecation and removal has
been communicated via prior release notes (ie
http://docs.opendaylight.org/en/stable-oxygen/release-notes/projects/controller.html
).


>
>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Genius distribution checks getting stuck at DataBrokerFailureTest

2018-05-16 Thread Tom Pantelis
On Wed, May 16, 2018 at 11:33 AM, Faseela K  wrote:

> +controller-dev
>
>
>
> Does this patch in controller have any impact ?
>
>
>
> https://git.opendaylight.org/gerrit/#/c/71547/
>

https://git.opendaylight.org/gerrit/#/c/71581/ fixed it. Sorry about that.


>
>
> Thanks,
>
> Faseela
>
>
>
> *From:* Faseela K
> *Sent:* Wednesday, May 16, 2018 8:33 PM
> *To:* 'release (rele...@lists.opendaylight.org)' <
> rele...@lists.opendaylight.org>; genius-...@lists.opendaylight.org
> *Subject:* Genius distribution checks getting stuck at
> DataBrokerFailureTest
>
>
>
> *11:21:02* Running org.opendaylight.genius.datastoreutils.testutils.
> infra.tests.AutoCloseableModuleTest
>
> *11:21:04* Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time
> elapsed: 1.487 sec - in org.opendaylight.genius.datastoreutils.testutils.
> infra.tests.AutoCloseableModuleTest
>
> *11:21:04* Running org.opendaylight.genius.datastoreutils.testutils.tests.
> TestableJobCoordinatorEventsWaiterTest
>
> *11:21:08* Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time
> elapsed: 3.726 sec - in org.opendaylight.genius.datastoreutils.testutils.
> tests.TestableJobCoordinatorEventsWaiterTest
>
> *11:21:08* Running org.opendaylight.genius.datastoreutils.testutils.tests.
> AbstractTestableListenerTest
>
> *11:21:08* Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time
> elapsed: 0.535 sec - in org.opendaylight.genius.datastoreutils.testutils.
> tests.AbstractTestableListenerTest
>
> *11:21:08* Running org.opendaylight.genius.datastoreutils.testutils.
> tests.DataBrokerFailuresTest
>
>
>
> https://jenkins.opendaylight.org/releng/job/genius-maven-
> verify-fluorine-mvn33-openjdk8/773/console
>
>
>
>
>
> Thanks,
>
> Faseela
>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Restconf no-auth features

2018-04-26 Thread Tom Pantelis
On Thu, Apr 26, 2018 at 1:57 PM, FREEMAN, BRIAN D <bf1...@att.com> wrote:

> How would that affect end users (if at all) ?
>

It wouldn't unless an end user doesn't want authentication which I doubt
anyone would in production. All other rest endpoints in ODL (that I know
of) only support authenticated.


>
>
> brian
>
>
>
> *From:* controller-dev-boun...@lists.opendaylight.org <
> controller-dev-boun...@lists.opendaylight.org> *On Behalf Of *Tom Pantelis
> *Sent:* Thursday, April 26, 2018 1:55 PM
> *To:* netconf-dev <netconf-...@lists.opendaylight.org>; controller-dev <
> controller-dev@lists.opendaylight.org>
> *Subject:* [controller-dev] Restconf no-auth features
>
>
>
> Hello,
>
>
>
> I am in the process of converting restconf to use the new web API in aaa
> in lieu of the web.xml files. Supporting no-auth is a bit of a pain. I have
> maintained no-auth support for the current draft02 restconf endpoint for
> legacy. However I'd like to drop no-auth for the new(er) rfc8040 endpoint
> to simplify. Any objections?
>
>
>
> Tom
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] Restconf no-auth features

2018-04-26 Thread Tom Pantelis
Hello,

I am in the process of converting restconf to use the new web API in aaa in
lieu of the web.xml files. Supporting no-auth is a bit of a pain. I have
maintained no-auth support for the current draft02 restconf endpoint for
legacy. However I'd like to drop no-auth for the new(er) rfc8040 endpoint
to simplify. Any objections?

Tom
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] Pax exam service lookup timeouts

2018-04-17 Thread Tom Pantelis
Hello,

With latest versions bump, we're seeing these errors more often for some
reason:

ServiceLookupException: gave up waiting for service
org.ops4j.pax.exam.ProbeInvoker

Looking at the surefire output file, I see this error:

Error in initialization script:
/w/workspace/controller-maven-verify-fluorine-mvn33-openjdk8/opendaylight/md-sal/samples/toaster-it/target/exam/6f474bde-ed1f-466d-9b9c-7a25dad45e1b/etc/shell.init.script:
String index out of range: 0
karaf@root()>

This seems to be the common denominator. I submitted
https://git.opendaylight.org/gerrit/#/c/71041/
to set configureConsole().ignoreLocalConsole() like SFT does. Let's see if
that alleviates the failures.

Tom
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] SingleFeatureTest fails.

2018-03-21 Thread Tom Pantelis
On Wed, Mar 21, 2018 at 7:15 AM, Michael Vorburger 
wrote:

> Arthi,
>
> On Mon, Mar 19, 2018 at 2:17 PM, Arthi Bhattacharjee <
> arthi_bhattacharje...@yahoo.in> wrote:
>
>> Hi Controller-dev team,
>>
>>
>>
>> I am using sal.binding.api.BindingAwareBroker.ProviderContext to get
>> MountPointService. While doing so, I'm facing SingleFeatureTest failures.
>> Below are the failures:
>>
> Tests in error:
>>
>>   Condition with alias 'checkBundleDiagInfos' didn't complete within 300
>> seconds because lambda expression in 
>> org.opendaylight.odlparent.bundlestest.TestBundleDiag:
>> expected system either ready with all bundles Active, or Stopping or
>> Failure (but not still booting in GracePeriod, Waiting, Starting,
>> Unknown;but just Resolved and some exceptional Installed OK) but was > Booting {Installed=0, Resolved=4, Unknown=0, GracePeriod=1, Waiting=0,
>> Starting=0, Active=384, Stopping=0, Failure=0}
>>
>> 1. NOK org.opendaylight.TrafficEngineering.impl: OSGi state = Active,
>> Karaf bundleState = GracePeriod, due to: Blueprint
>>
>> 3/19/18 5:37 PM
>>
>> Missing dependencies:
>>
>> (objectClass=org.opendaylight.controller.sal.binding.api.Bin
>> dingAwareBroker.ProviderContext)
>>
>
> Do you hit this while you build controller source code, or when using this
> in custom code of yours?
>
> There are some times timing issues in SingleFeatureTest (SFT). Have you
> retried - it may well just work again? Or is it consistent?
>
> Just for test, when you "mvn -Pq clean install" which includes
> -Dsft.diag.skip=true, does your ODL feature work when you manually install
> it into your Karaf?
>
> NOTE: We are using Carbon Release.
>>
>
> FYI we don't work much with Carbon anymore to support you here on
> upstream; if you can try a later release, you would likely get better
> support (generally speaking).
>
>
>> Are there any dependencies that am I missing?
>>
>

You didn't provide your blueprint xml so can't tell what you're doing but
the error indicates you're trying to import a
BindingAwareBroker.ProviderContext
OSGi service but there is no such service advertised (AFAIK).  In fact
the BindingAwareBroker is legacy from before blueprint adoption so there's
no reason to use it with blueprint (in fact I don't even know how one would
use it - never tried it :)).


>
>>
>> Looking forward for the response.
>>
>>
>>
>> Thanks,
>>
>> Arthi
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>> ___
>> controller-dev mailing list
>> controller-dev@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>
>>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] Autorelease oxygen failed to build sal-binding-it from controller

2018-03-16 Thread Tom Pantelis
On Fri, Mar 16, 2018 at 11:10 AM, Daniel Farrell 
wrote:

> Is this the relevant error?
>
> [ERROR] test(org.opendaylight.controller.test.sal.binding.it.DataServiceIT)
> Time elapsed: 195.804 s  <<< ERROR!
> org.ops4j.pax.swissbox.tracker.ServiceLookupException: gave up waiting
> for service org.ops4j.pax.exam.ProbeInvoker
>
> Anything we can do about it?
>

I believe that's an intermittent problem we've been seeing for a while with
pax-exam in general.


>
> Thanks,
> Daniel
>
> On Thu, Mar 15, 2018 at 9:45 PM Jenkins  opendaylight.org> wrote:
>
>> Attention controller-devs,
>>
>> Autorelease oxygen failed to build sal-binding-it from controller in build
>> 219. Attached is a snippet of the error message related to the
>> failure that we were able to automatically parse as well as console logs.
>>
>>
>> Console Logs:
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/
>> autorelease-release-oxygen/219
>>
>> Jenkins Build:
>> https://jenkins.opendaylight.org/releng/job/autorelease-
>> release-oxygen/219/
>>
>> Please review and provide an ETA on when a fix will be available.
>>
>> Thanks,
>> ODL releng/autorelease team
>>
>> ___
>> release mailing list
>> rele...@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/release
>>
>
> ___
> release mailing list
> rele...@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/release
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] List element cohort validation causes issue

2018-03-10 Thread Tom Pantelis
On Sat, Mar 10, 2018 at 11:53 AM, Satish Dutt  wrote:

> Hi All,
>
>
>
> I have a cohort validation for the list element. If I try to create the
> entire list having multiple items, I am getting the below error. If I try
> to create a single list element, then its successful.  Is there any way I
> can avoid this issue. I am trying this in Boron-SR3
>
>
>
>
>
> 2018-03-07 14:53:01,681 | ERROR | ult-dispatcher-4 |
> OneForOneStrategy| 153 - com.typesafe.akka.slf4j - 2.4.7
> | Unexpected message class 
> org.opendaylight.controller.cluster.datastore.DataTreeCohortActor$CanCommit
> in cohort behavior PostCanCommit
>
> java.lang.UnsupportedOperationException: Unexpected message class
> org.opendaylight.controller.cluster.datastore.DataTreeCohortActor$CanCommit
> in cohort behavior PostCanCommit
>
> at org.opendaylight.controller.cluster.datastore.
> DataTreeCohortActor$CohortBehaviour.handle(DataTreeCohortActor.java:152)[
> 170:org.opendaylight.controller.sal-distributed-datastore:1.4.3.Boron-SR3]
>
> at org.opendaylight.controller.cluster.datastore.
> DataTreeCohortActor.handleReceive(DataTreeCohortActor.java:45)[
> 170:org.opendaylight.controller.sal-distributed-datastore:1.4.3.Boron-SR3]
>
> at org.opendaylight.controller.cluster.common.actor.
> AbstractUntypedActor.onReceive(AbstractUntypedActor.java:26)[
> 164:org.opendaylight.controller.sal-clustering-commons:1.4.3.Boron-SR3]
>
> at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(
> UntypedActor.scala:165)[152:com.typesafe.akka.actor:2.4.7]
>
> at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
> [152:com.typesafe.akka.actor:2.4.7]
>
> at akka.actor.UntypedActor.aroundReceive(UntypedActor.
> scala:95)[152:com.typesafe.akka.actor:2.4.7]
>
> at akka.actor.ActorCell.receiveMessage(ActorCell.
> scala:526)[152:com.typesafe.akka.actor:2.4.7]
>
> at akka.actor.ActorCell.invoke(ActorCell.scala:495)[152:com.
> typesafe.akka.actor:2.4.7]
>
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:
> 257)[152:com.typesafe.akka.actor:2.4.7]
>
> at akka.dispatch.Mailbox.run(Mailbox.scala:224)[152:com.
> typesafe.akka.actor:2.4.7]
>
> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[152:com.
> typesafe.akka.actor:2.4.7]
>
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)[148:org.scala-lang.scala-library:
> 2.11.8.v20160304-115712-1706a37eb8]
>
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1339)[148:org.scala-lang.scala-library:2.11.8.
> v20160304-115712-1706a37eb8]
>
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)[148:org.scala-lang.scala-library:
> 2.11.8.v20160304-115712-1706a37eb8]
>
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)[148:org.scala-lang.scala-
> library:2.11.8.v20160304-115712-1706a37eb8]
>
>
>

This was fixed by https://git.opendaylight.org/gerrit/#/c/51584/ which is
in Nitrogen.


>
>
> Regards
>
> -Satish
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Need Input on Geo Cluster Behavior for Node Isolation/Un-isolation

2018-02-27 Thread Tom Pantelis
On Fri, Feb 16, 2018 at 12:42 AM, Chethana Lakshmanappa <
cheth...@luminanetworks.com> wrote:

> Hi All,
>
> Kindly need your input on some of the behavior seen in Geo cluster setup
> when a node is isolated and un-isolated.
>
> Suppose Geo cluster has nodes A, B and C residing in one primary data
> center which is voting and D, E & F residing in secondary data center which
> is non-voting:
>
>- If a node is Isolated, let's say Node B, then immediately in the
>cluster all nodes are unreachable to each other.
>
>
That is odd. How do you know that all nodes became unreachable to each
other? The log excerpt below just indicates that 10.18.130.105 lost
reachability with 10.18.130.103 (Node B I assume) which is expected. The
message "Leader can currently not perform its duties" means that the akka
cluster leader cannot allow new nodes to be added to the cluster or nodes
removed until the lost node comes back or is downed.

>
>- All nodes wait for a threshold amount of time before making Node B
>as quarantined and then reachability within the cluster is restored.
>
>- What is the threshold amount of time it needs to wait?
>   - If the node goes down or stopped, this behavior is not seen. It
>   is seen only when it is isolated. How is this different from node down?
>
>
> *Log excerpt from Node A when Node B is isolated:*
> 130.103:2550] has failed, address is now gated for [5000] ms. Reason:
> [Disassociated]
> 2018-02-15 19:53:56,109 | INFO  | lt-dispatcher-22 |
> kka://opendaylight-cluster-data) | 113 - com.typesafe.akka.slf4j - 2.4.18
> | Cluster Node [akka.tcp://opendaylight-cluster-data@10.18.130.105:2550]
> - Leader can currently not perform its duties, reachability status: [
> akka.tcp://opendaylight-cluster-data@10.18.130.105:2550 ->
> akka.tcp://opendaylight-cluster-data@10.18.130.103:2550: Unreachable
> [Unreachable] (1)], member status: [akka.tcp://opendaylight-
> cluster-data@10.18.130.103:2550 Up seen=false, akka.tcp://opendaylight-
> cluster-data@10.18.130.105:2550 Up seen=true, akka.tcp://opendaylight-
> cluster-data@10.18.130.84:2550 Up seen=true, akka.tcp://opendaylight-
> cluster-data@10.18.131.27:2550 Up seen=true, akka.tcp://opendaylight-
> cluster-data@10.18.131.31:2550 Up seen=true, akka.tcp://opendaylight-
> cluster-data@10.18.131.39:2550 Up seen=true]
>
>
>
>- If a *Shard Leader* is Isolated, let’s say you make Node A as shard
>leader for all shards and data store. On isolating and un-isolating Node A,
>I see the following:
>
>- Primary voting nodes are unreachable to secondary nodes and vice
>   versa. Cluster never recovers and all nodes need to be restarted to have
>   cluster working. *Is this a bug?*
>   - Also the isolated node which is un-isolated is unreachable to
>   primary voting nodes and never recovers.
>
>
It may be that, on un-isolation, split brain occurred in akka with 2
cluster leaders. I assume that Node A was the akka cluster leader when it
was isolated - it would be interesting to see if this also occurs if a
non-cluster
leader node is isolated.

Also make sure you do not have the auto-down-unreachable-after option
enabled in the akka.conf.


>
> *Log excerpt:*
> 2018-02-15 19:32:47,174 | INFO  | lt-dispatcher-19 |
> kka://opendaylight-cluster-data) | 113 - com.typesafe.akka.slf4j - 2.4.18
> | Cluster Node [akka.tcp://opendaylight-cluster-data@10.18.131.27:2550] -
> Leader can currently not perform its duties, reachability status: [
> akka.tcp://opendaylight-cluster-data@10.18.130.105:2550 ->
> akka.tcp://opendaylight-cluster-data@10.18.130.103:2550: Unreachable
> [Terminated] (1), akka.tcp://opendaylight-cluster-data@10.18.130.105:2550
> -> akka.tcp://opendaylight-cluster-data@10.18.130.84:2550: Unreachable
> [Unreachable] (2), akka.tcp://opendaylight-cluster-data@10.18.131.27:2550
> -> akka.tcp://opendaylight-cluster-data@10.18.130.103:2550: Terminated
> [Terminated] (4), akka.tcp://opendaylight-cluster-data@10.18.131.27:2550
> -> akka.tcp://opendaylight-cluster-data@10.18.130.84:2550: Unreachable
> [Unreachable] (2), akka.tcp://opendaylight-cluster-data@10.18.131.31:2550
> -> akka.tcp://opendaylight-cluster-data@10.18.130.103:2550: Unreachable
> [Terminated] (3), akka.tcp://opendaylight-cluster-data@10.18.131.31:2550
> -> akka.tcp://opendaylight-cluster-data@10.18.130.84:2550: Unreachable
> [Unreachable] (2), akka.tcp://opendaylight-cluster-data@10.18.131.39:2550
> -> akka.tcp://opendaylight-cluster-data@10.18.130.103:2550: Unreachable
> [Terminated] (3), akka.tcp://opendaylight-cluster-data@10.18.131.39:2550
> -> akka.tcp://opendaylight-cluster-data@10.18.130.84:2550: Unreachable
> [Unreachable] (2)], member status: [akka.tcp://opendaylight-
> cluster-data@10.18.130.103:2550 Down seen=false, akka.tcp://opendaylight-
> cluster-data@10.18.130.105:2550 WeaklyUp seen=true,
> akka.tcp://opendaylight-cluster-data@10.18.130.84:2550 Up seen=false,
> akka.tcp://opendaylight-cluster-data@10.18.131.27:2550 Up 

Re: [controller-dev] Carbon SR3: circuit breaker timed out. Transaction aborted due to shutdown.

2018-02-16 Thread Tom Pantelis
On Fri, Feb 16, 2018 at 2:35 PM, Jamo Luhrsen <jluhr...@gmail.com> wrote:

>
>
> On 2/16/18 11:33 AM, Tom Pantelis wrote:
> >
> >
> > On Fri, Feb 16, 2018 at 2:26 PM, Jamo Luhrsen <jluhr...@gmail.com
> <mailto:jluhr...@gmail.com>> wrote:
> >
> > I'm analyzing CSIT failures for our Carbon SR3 candidate.
> >
> > Something nasty went wrong in a netvirt CSIT job in the middle of
> > the robot tests. Seems like all functionality is probably broken
> > after that.
> >
> > in the karaf.log [0] I see a message about some akka circuit breaker
> > Timed out, then a bunch of RuntimeExceptions: Transaction
> > aborted due to shutdown.
> >
> >
> > yeah that means akka persistence failed, ie it timed out waiting for
> data to be written to the disk. That kills the
> > shard actor with no recovery.  This can happen if there's slow disk
> access/contention in the env - seen this happen
> > before with internal CSIT env before the disk issue was resolved.
>
> Thanks. I'll report to the infra guys that we are still likely seeing
> some high disk IO latency. There was another job with similar issues.
>

The timeout(s) can be increased in the akka.conf (would have to look it up)
if it's really problematic although that's really just a band-aid.


>
> JamO
>
> > Any ideas what's happening here?
> >
> > Thanks,
> > JamO
> >
> > [0]https://logs.opendaylight.org/releng/vex-yul-odl-
> jenkins-1/netvirt-csit-1node-openstack-pike-upstream-
> stateful-snat-conntrack-carbon/200/odl_1/odl1_karaf.log.gz
> > <https://logs.opendaylight.org/releng/vex-yul-odl-
> jenkins-1/netvirt-csit-1node-openstack-pike-upstream-
> stateful-snat-conntrack-carbon/200/odl_1/odl1_karaf.log.gz>
> > ___
> > controller-dev mailing list
> > controller-dev@lists.opendaylight.org <mailto:controller-dev@lists.
> opendaylight.org>
> > https://lists.opendaylight.org/mailman/listinfo/controller-dev
> > <https://lists.opendaylight.org/mailman/listinfo/controller-dev>
> >
> >
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Carbon SR3: circuit breaker timed out. Transaction aborted due to shutdown.

2018-02-16 Thread Tom Pantelis
On Fri, Feb 16, 2018 at 2:26 PM, Jamo Luhrsen  wrote:

> I'm analyzing CSIT failures for our Carbon SR3 candidate.
>
> Something nasty went wrong in a netvirt CSIT job in the middle of
> the robot tests. Seems like all functionality is probably broken
> after that.
>
> in the karaf.log [0] I see a message about some akka circuit breaker
> Timed out, then a bunch of RuntimeExceptions: Transaction
> aborted due to shutdown.
>

yeah that means akka persistence failed, ie it timed out waiting for data
to be written to the disk. That kills the shard actor with no recovery.
This can happen if there's slow disk access/contention in the env - seen
this happen before with internal CSIT env before the disk issue was
resolved.


>
> Any ideas what's happening here?
>
> Thanks,
> JamO
>
> [0]https://logs.opendaylight.org/releng/vex-yul-odl-
> jenkins-1/netvirt-csit-1node-openstack-pike-upstream-
> stateful-snat-conntrack-carbon/200/odl_1/odl1_karaf.log.gz
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Getting Schema Service via config sub-system

2018-02-14 Thread Tom Pantelis
Sonu,

I would suggest switching to blueprint - the config subsystem is deprecated
and is scheduled for removal in Flourine.

Tom

On Wed, Feb 14, 2018 at 1:52 AM, Sonu Gupta 
wrote:

> Hi All,
>
> I am facing an issue while getting out the schema service via the config 
> subsystem
>
> In my provider-impl.yang file I have this:
>
>  container root-schema-service {
>
> uses config:service-ref {
>
>  refine type {
>
>  mandatory true;
>  config:required-identity dom:schema-service;
>
> }
>
> }
>
>  }
>
>
> and
>
> default-config.xml
>
> 
>
>  
> 

Re: [controller-dev] compiling error about mdsal-eos-binding-adapter snapshot dependency when switch to odlparent 3.0.2

2018-02-06 Thread Tom Pantelis
It's not just merge jobs with random failures. Can anyone explain why this
AR job failed:
https://jenkins.opendaylight.org/releng/job/openflowplugin-validate-autorelease-oxygen/227/console
?

13:49:24 [INFO]

13:49:24 [INFO] Reactor Summary:
13:49:24 [INFO]
13:49:24 [INFO] mdsal-artifacts 
SUCCESS [ 35.062 s]
13:49:24 [INFO] mdsal-model-artifacts ..
SUCCESS [ 34.991 s]
13:49:24 [INFO] config-artifacts ...
SUCCESS [ 42.971 s]
13:49:24 [INFO] yang-test-plugin ...
SUCCESS [01:13 min]
13:49:24 [INFO] mdsal-artifacts 
SUCCESS [ 43.020 s]
13:49:24 [INFO] autorelease-validate-projects ..
SUCCESS [ 35.985 s]
13:49:24 [INFO]

13:49:24 [INFO] BUILD SUCCESS
13:49:24 [INFO]

13:49:24 [INFO] Total time: 01:41 min (Wall Clock)
13:49:24 [INFO] Finished at: 2018-02-06T13:49:24+00:00
13:49:24 [INFO] Final Memory: 51M/309M


It all indicates success but still mysteriously fails. I don't see any
reason why in the output.

On Tue, Feb 6, 2018 at 8:45 AM, Thanh Ha 
wrote:

> On Tue, Feb 6, 2018 at 5:20 AM, Robert Varga  wrote:
>
>> On 06/02/18 06:53, Thanh Ha wrote:
>> > The old-style mdsal-merge job deployed and I kicked off a build:
>> >
>> > https://jenkins.opendaylight.org/releng/view/Merge-Jobs/job/
>> mdsal-merge-oxygen/1/console
>> >
>> > It takes about 50 minutes for mdsal to build so we'll know in ~50
>> > minutes if that fixed the issue.
>>
>> Hello Thanh,
>>
>> the one thing I found is a difference in file upload strategy in the two
>> jobs:
>>
>> old:
>>
>> > Deploying the main artifact iana-afn-safi-2013.07.04.12.0-SNAPSHOT.jar
>> > Downloading: https://nexus.opendaylight.org
>> /content/repositories/opendaylight.snapshot/org/opendaylight
>> /mdsal/model/iana-afn-safi/2013.07.04.12.0-SNAPSHOT/maven-metadata.xml
>> > Downloaded: https://nexus.opendaylight.org
>> /content/repositories/opendaylight.snapshot/org/opendaylight
>> /mdsal/model/iana-afn-safi/2013.07.04.12.0-SNAPSHOT/maven-metadata.xml
>> (2 KB at 20.8 KB/sec)
>> > Uploading: https://nexus.opendaylight.org/content/repositories/opendayl
>> ight.snapshot/org/opendaylight/mdsal/model/iana-afn-safi/
>> 2013.07.04.12.0-SNAPSHOT/iana-afn-safi-2013.07.04.12.0-
>> 20180206.071107-112.jar
>> > Uploaded: https://nexus.opendaylight.org/content/repositories/opendayl
>> ight.snapshot/org/opendaylight/mdsal/model/iana-afn-safi/
>> 2013.07.04.12.0-SNAPSHOT/iana-afn-safi-2013.07.04.12.0-
>> 20180206.071107-112.jar (23 KB at 2.4 KB/sec)
>> > Uploading: https://nexus.opendaylight.org/content/repositories/opendayl
>> ight.snapshot/org/opendaylight/mdsal/model/iana-afn-safi/
>> 2013.07.04.12.0-SNAPSHOT/iana-afn-safi-2013.07.04.12.0-
>> 20180206.071107-112.pom
>> > Uploaded: https://nexus.opendaylight.org/content/repositories/opendayl
>> ight.snapshot/org/opendaylight/mdsal/model/iana-afn-safi/
>> 2013.07.04.12.0-SNAPSHOT/iana-afn-safi-2013.07.04.12.0-
>> 20180206.071107-112.pom (2 KB at 7.5 KB/sec)
>> > Downloading: https://nexus.opendaylight.org
>> /content/repositories/opendaylight.snapshot/org/opendaylight
>> /mdsal/model/iana-afn-safi/maven-metadata.xml
>> > Downloaded: https://nexus.opendaylight.org
>> /content/repositories/opendaylight.snapshot/org/opendaylight
>> /mdsal/model/iana-afn-safi/maven-metadata.xml (481 B at 31.3 KB/sec)
>> > Uploading: https://nexus.opendaylight.org/content/repositories/opendayl
>> ight.snapshot/org/opendaylight/mdsal/model/iana-afn-safi/
>> 2013.07.04.12.0-SNAPSHOT/maven-metadata.xml
>> > Uploaded: https://nexus.opendaylight.org/content/repositories/opendayl
>> ight.snapshot/org/opendaylight/mdsal/model/iana-afn-safi/
>> 2013.07.04.12.0-SNAPSHOT/maven-metadata.xml (2 KB at 4.3 KB/sec)
>> > Uploading: https://nexus.opendaylight.org/content/repositories/opendayl
>> ight.snapshot/org/opendaylight/mdsal/model/iana-afn-safi/
>> maven-metadata.xml
>> > Uploaded: https://nexus.opendaylight.org/content/repositories/opendayl
>> ight.snapshot/org/opendaylight/mdsal/model/iana-afn-safi/
>> maven-metadata.xml (481 B at 0.6 KB/sec)
>> > Deploying the main artifact iana-afn-safi-2013.07.04.12.0-
>> SNAPSHOT-javadoc.jar
>> > Uploading: https://nexus.opendaylight.org/content/repositories/opendayl
>> ight.snapshot/org/opendaylight/mdsal/model/iana-afn-safi/
>> 2013.07.04.12.0-SNAPSHOT/iana-afn-safi-2013.07.04.12.0-
>> 20180206.071107-112-javadoc.jar
>> > Uploaded: https://nexus.opendaylight.org/content/repositories/opendayl
>> ight.snapshot/org/opendaylight/mdsal/model/iana-afn-safi/
>> 2013.07.04.12.0-SNAPSHOT/iana-afn-safi-2013.07.04.12.0-
>> 20180206.071107-112-javadoc.jar (50 KB at 58.1 KB/sec)
>> > Uploading: 

Re: [controller-dev] [release] Autorelease oxygen failed to build sal-remoterpc-connector from controller

2018-01-25 Thread Tom Pantelis
On Thu, Jan 25, 2018 at 1:42 PM, Michael Vorburger 
wrote:

> On Thu, Jan 25, 2018 at 6:55 PM, Jenkins  opendaylight.org> wrote:
>
>> Attention controller-devs,
>>
>> Autorelease oxygen failed to build sal-remoterpc-connector from
>> controller in build
>> 124. Attached is a snippet of the error message related to the
>> failure that we were able to automatically parse as well as console logs.
>>
>>
>> Console Logs:
>> https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/a
>> utorelease-release-oxygen/124
>>
>> Jenkins Build:
>> https://jenkins.opendaylight.org/releng/job/autorelease-rele
>> ase-oxygen/124/
>
>
> Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 5.47 sec <<< 
> FAILURE! - in 
> org.opendaylight.controller.remote.rpc.registry.mbeans.RemoteRpcRegistryMXBeanImplTest
> testFindRpcByRoute(org.opendaylight.controller.remote.rpc.registry.mbeans.RemoteRpcRegistryMXBeanImplTest)
>   Time elapsed: 0.98 sec  <<< ERROR!
> java.lang.IllegalStateException: Attempted to access local bucket before 
> recovery completed
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:501)
>   at 
> org.opendaylight.controller.remote.rpc.registry.gossip.BucketStoreActor.getLocalBucket(BucketStoreActor.java:384)
>   at 
> org.opendaylight.controller.remote.rpc.registry.gossip.BucketStoreActor.getLocalData(BucketStoreActor.java:110)
>   at 
> org.opendaylight.controller.remote.rpc.registry.mbeans.RemoteRpcRegistryMXBeanImpl.findRpcByRoute(RemoteRpcRegistryMXBeanImpl.java:91)
>   at 
> org.opendaylight.controller.remote.rpc.registry.mbeans.RemoteRpcRegistryMXBeanImplTest.testFindRpcByRoute(RemoteRpcRegistryMXBeanImplTest.java:142)
>
> anyone got an idea how to make this RemoteRpcRegistryMXBeanImplTest more
> reliable?
>
> Perhaps increase some timeout?
>


https://git.opendaylight.org/gerrit/#/c/67593/ should fix it.


>
>

>
>> Please review and provide an ETA on when a fix will be available.
>>
>> Thanks,
>> ODL releng/autorelease team
>>
>>
>> ___
>> controller-dev mailing list
>> controller-dev@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>
>>
>
> ___
> release mailing list
> rele...@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/release
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] odlparent 3.0.2 for CONTROLLER-1799: Archetype self test during Maven build

2018-01-22 Thread Tom Pantelis
On Mon, Jan 22, 2018 at 9:26 PM, Michael Vorburger <vorbur...@redhat.com>
wrote:

> On Mon, Jan 22, 2018 at 7:17 PM, Michael Vorburger <vorbur...@redhat.com>
> wrote:
>
>> On Thu, Jan 18, 2018 at 5:25 PM, Tom Pantelis <tompante...@gmail.com>
>> wrote:
>>
>>> On Thu, Jan 18, 2018 at 10:45 AM, Michael Vorburger <
>>> vorbur...@redhat.com> wrote:
>>>
>>>> On Wed, Nov 29, 2017 at 12:29 PM, Michael Vorburger <
>>>> vorbur...@redhat.com> wrote:
>>>>
>>>>> On Mon, Nov 27, 2017 at 8:10 PM, Michael Vorburger <
>>>>> vorbur...@redhat.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> almost 3 months ago, on https://lists.opendaylight.
>>>>>> org/pipermail/controller-dev/2017-September/013889.html, I had
>>>>>> started a thread re. the mysterious problems hit in the controller
>>>>>> archetype "self test", which occured on Gerrit and Jenkins, and only 
>>>>>> there
>>>>>> (it worked locally even 3 months ago).
>>>>>>
>>>>>> Today I finally made time to progress on this, and
>>>>>> https://jira.opendaylight.org/browse/CONTROLLER-1799 has the write
>>>>>> up what is going on there, documented for future reference.
>>>>>>
>>>>>> Is this feasible to get an odlparent 3.0.2 with
>>>>>> https://git.opendaylight.org/gerrit/#/c/65940/ for
>>>>>> https://git.opendaylight.org/gerrit/#/c/65941/ ?
>>>>>>
>>>>>> Tx,
>>>>>> M.
>>>>>>
>>>>>> PS: I'm hoping to work on https://jira.opendaylight.org/
>>>>>> browse/INFRAUTILS-17 in the coming days, which will likely also
>>>>>> require a change in odlparent; perhaps this and that could be pooled
>>>>>> together into a 3.0.2 - anything else?
>>>>>>
>>>>>
>>>>> https://git.opendaylight.org/gerrit/#/c/66030/ and its related
>>>>> changes are what I meant here; if all of this, together with
>>>>> https://git.opendaylight.org/gerrit/#/c/65940/, could be released as
>>>>> an odlparent 3.0.2, in the hopefully not-too-distant future, that would be
>>>>> fabulous.
>>>>>
>>>>
>>>> would any fellow controller commiter be willing to merge this
>>>> https://git.opendaylight.org/gerrit/#/c/65941/ now?
>>>>
>>>> Tom, or even I volunteer to, then rebase https://git.opendayligh
>>>> t.org/gerrit/#/c/66545/ on top of that c/65941 and then I'm happy to
>>>> merge that one after.
>>>>
>>>> It would be good to get both of these archetype things into Oxygen
>>>> still IMHO.
>>>>
>>>
>>> Agree - I rebased - will merge after
>>>
>>
>> just done, but hit https://jira.opendaylight.org/browse/CONTROLLER-1810
>> .. following the Big Bump, the Archetype IT is actually broken.. seems to
>> have something to do with some (shutdown related?) problem in AAA ? Don't
>> be shy to comment on CONTROLLER-1810 if you have any clue what could be
>> causing that.
>>
>
>> I'm hoping to merge https://git.opendaylight.org/gerrit/#/c/66545/ with
>> that @Ignore ASAP anyway, if nobody has any objections; we can then
>> subsquently remove the @Ignore in the archetype IT, when CONTROLLER-1810 is
>> sorted.
>>
>
> FYI https://git.opendaylight.org/gerrit/#/c/66545/ is now finally merged,
> but without IT; in addition to https://jira.opendaylight.o
> rg/browse/CONTROLLER-1810 there seems to be a second (new?) problem with
> the IT of the archetype, detailed in https://jira.opendaylight.
> org/browse/CONTROLLER-1811.
>
> What are people's views re. ditching the IT and/or CLI (suggested by Sam &
> Tom) from the archetype?
>

I think we should keep the startup archetype simple/basic and have an
advanced archetype that has the extra fluff IT, CLI etc.
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] odlparent 3.0.2 for CONTROLLER-1799: Archetype self test during Maven build

2018-01-18 Thread Tom Pantelis
On Thu, Jan 18, 2018 at 10:45 AM, Michael Vorburger 
wrote:

> On Wed, Nov 29, 2017 at 12:29 PM, Michael Vorburger 
> wrote:
>
>> On Mon, Nov 27, 2017 at 8:10 PM, Michael Vorburger 
>> wrote:
>>
>>> Hello,
>>>
>>> almost 3 months ago, on https://lists.opendaylight.
>>> org/pipermail/controller-dev/2017-September/013889.html, I had started
>>> a thread re. the mysterious problems hit in the controller archetype "self
>>> test", which occured on Gerrit and Jenkins, and only there (it worked
>>> locally even 3 months ago).
>>>
>>> Today I finally made time to progress on this, and
>>> https://jira.opendaylight.org/browse/CONTROLLER-1799 has the write up
>>> what is going on there, documented for future reference.
>>>
>>> Is this feasible to get an odlparent 3.0.2 with
>>> https://git.opendaylight.org/gerrit/#/c/65940/ for
>>> https://git.opendaylight.org/gerrit/#/c/65941/ ?
>>>
>>> Tx,
>>> M.
>>>
>>> PS: I'm hoping to work on https://jira.opendaylight.org/
>>> browse/INFRAUTILS-17 in the coming days, which will likely also require
>>> a change in odlparent; perhaps this and that could be pooled together into
>>> a 3.0.2 - anything else?
>>>
>>
>> https://git.opendaylight.org/gerrit/#/c/66030/ and its related changes
>> are what I meant here; if all of this, together with
>> https://git.opendaylight.org/gerrit/#/c/65940/, could be released as an
>> odlparent 3.0.2, in the hopefully not-too-distant future, that would be
>> fabulous.
>>
>
> would any fellow controller commiter be willing to merge this
> https://git.opendaylight.org/gerrit/#/c/65941/ now?
>
> Tom, or even I volunteer to, then rebase https://git.
> opendaylight.org/gerrit/#/c/66545/ on top of that c/65941 and then I'm
> happy to merge that one after.
>
> It would be good to get both of these archetype things into Oxygen still
> IMHO.
>

Agree - I rebased - will merge after


>
>
>> --
>>> Michael Vorburger, Red Hat
>>> vorbur...@redhat.com | IRC: vorburger @freenode | ~ =
>>> http://vorburger.ch
>>>
>>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [Opendaylight-users] More Nitrogen weirdness

2018-01-05 Thread Tom Pantelis
On Fri, Jan 5, 2018 at 6:04 PM, Ryan Dietrich 
wrote:

> Ended up in the hospital over the holidays (I’m fine now), sorry I wasn’t
> able to follow up sooner.
>
> So, I checked out the controller repo from both GitHub and
> git.opendaylight.org, and your SHA for the commit (
> 02888d8e212ec0a79270c1e5824e0a491d7d2660) listed in your change isn’t
> “there”.  I assume some git rebase -i shenanigans are going on or something?
>

My patch https://git.opendaylight.org/gerrit/#/c/66545/ hasn't been merged
yet. Still needs review. If you need it now then you can clone the
controller project (master branch) and cherry-pick my patch. You can test
it out and then review and +1 the patch to facilitate merging (assuming you
don't find any issues of course).


>
>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Follow-up from ONAP F2F in Santa Clara

2018-01-05 Thread Tom Pantelis
On Fri, Jan 5, 2018 at 10:49 AM, FREEMAN, BRIAN D <bf1...@att.com> wrote:

> Scaling out a cluster is a hot topic as we plan ONAP S3P features.
>
>
>
> Is there a collective conscious on the “right” approach that we would want
> to take in ODL ?
>
>
>
> I would think it would be based on microsharding  or something where the
> would be the capability to split and merge shards across virtual cluster
> sets (or something smarter than this).
>
> It would seem that this has been solved by other clustered datastores – is
> there something to leverage ?
>

Yes it would involve microsharding.


>
>
> Brian
>
>
>
>
>
> *From:* controller-dev-boun...@lists.opendaylight.org [mailto:
> controller-dev-boun...@lists.opendaylight.org] *On Behalf Of *Tom Pantelis
> *Sent:* Friday, January 05, 2018 10:41 AM
> *To:* Ryan Goulding <ryandgould...@gmail.com>
> *Cc:* controller-dev <controller-dev@lists.opendaylight.org>; MACNIDER,
> JAMES <james.macni...@amdocs.com>
> *Subject:* Re: [controller-dev] Follow-up from ONAP F2F in Santa Clara
>
>
>
>
>
>
>
> On Fri, Jan 5, 2018 at 9:59 AM, Ryan Goulding <ryandgould...@gmail.com>
> wrote:
>
> Greetings, James.
>
>
>
> +controller-dev
>
>
>
> Forming a reply has been on my to-do list, but unfortunately have been
> quite busy since the holidays.  Thanks for reaching out;  I actually purely
> use gmail now since I have actually now worked on ODL for three different
> companies, and having the same email ensures some continuity of
> communication with peers.
>
>
>
> I am adding in Tom Pantelis.  From private conversations I had with him,
> we expose an RPC to do scale out of the cluster.  Basically, we never
> productized or recommended it commercially, and there is likely some more
> scripting needed to update the new "seed" nodes to be in good shape to join
> the cluster.  I may be butchering terminology here, so I'll defer to Tom
> who is the clustering guru of ODL.
>
>
>
> Tom, this is James.  James works for Amdocs and was interested in
> scale-out of ODL.  I have added in controller-dev, as I am sure others
> would benefit from hearing how this is done!
>
>
>
>
>
> There is an RPC to add a shard replicas and join an existing cluster
> (implements Raft AddServer RPC in the backend). However it's mainly a
> building block - there needs to be additional changes to configure the
> akka.conf  et al to talk to the existing cluster etc.  Ideally this would
> be scripted and productized.
>
>
>
> That said, it isn't really needed for HA, where the cluster members are
> typically setup once and remain static, which so far has been the use case
> for ODL clustering, in my experience anyway.  Implementing scale-out is a
> different story - accomplishing that likely depends on the specific use
> case, application(s) and data models involved with specific sharding
> configurations on deployment.
>
>
>
> Thanks and Best Regards,
>
>
> Ryan Goulding
>
>
>
> On Fri, Jan 5, 2018 at 9:42 AM, James MacNider <james.macni...@amdocs.com>
> wrote:
>
> Hi Ryan,
>
>
>
> I hope that it’s alright that I’m contacting you at your Gmail address,
> it’s the only one I could find on the ONAP wiki.  I’d like to follow up on
> the brief exchange we had at the ONAP meeting in Santa Clara where we
> talked about the possibility of scaling an ODL cluster without
> reconfiguring all the nodes and restarting them.  I’ve been unable to find
> documentation that reflects this yet, but if I recall correctly, you knew
> the ODL contributor(s) that worked on this feature set.  Could you help me
> connect with them to learn more about this?
>
>
>
> Thanks,
>
>
>
> *James MacNider*
>
> Software Architect
>
>
>
> Open Network Division
>
> Amdocs Technology
>
> (office) (613)-595-5213 <(613)%20595-5213>
>
> [image: amdocs-a]
>
> *Amdocs* is a Platinuim Member of ONAP
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__onap.org_=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=e3d1ehx3DI5AoMgDmi2Fzw=rkXpbCq77Iuyf74p7ST12UXHckbPGGhxkZcbtjJ61-k=KpujWG0xrRVIziU5C2G5Af8N9j0rJrrdxzrH9mfK0FI=>
>
>
>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
>
> you may review at https://www.amdocs.com/about/email-disclaimer
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.amdocs.com_about_email-2Ddisclaimer=DwMFaQ=LFYZ-o9_HUMeMTSQicvjIg=e3d1ehx3DI5AoMgDmi2Fzw=rkXpbCq77Iuyf74p7ST12UXHckbPGGhxkZcbtjJ61-k=uFLW-P03imsiu4_QTXvPVSDDp1-16R5fRSJ15U1xgec=>
>
>
>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Follow-up from ONAP F2F in Santa Clara

2018-01-05 Thread Tom Pantelis
On Fri, Jan 5, 2018 at 9:59 AM, Ryan Goulding <ryandgould...@gmail.com>
wrote:

> Greetings, James.
>
> +controller-dev
>
> Forming a reply has been on my to-do list, but unfortunately have been
> quite busy since the holidays.  Thanks for reaching out;  I actually purely
> use gmail now since I have actually now worked on ODL for three different
> companies, and having the same email ensures some continuity of
> communication with peers.
>
> I am adding in Tom Pantelis.  From private conversations I had with him,
> we expose an RPC to do scale out of the cluster.  Basically, we never
> productized or recommended it commercially, and there is likely some more
> scripting needed to update the new "seed" nodes to be in good shape to join
> the cluster.  I may be butchering terminology here, so I'll defer to Tom
> who is the clustering guru of ODL.
>
> Tom, this is James.  James works for Amdocs and was interested in
> scale-out of ODL.  I have added in controller-dev, as I am sure others
> would benefit from hearing how this is done!
>
>
There is an RPC to add a shard replicas and join an existing cluster
(implements Raft AddServer RPC in the backend). However it's mainly a
building block - there needs to be additional changes to configure the
akka.conf  et al to talk to the existing cluster etc.  Ideally this would
be scripted and productized.

That said, it isn't really needed for HA, where the cluster members are
typically setup once and remain static, which so far has been the use case
for ODL clustering, in my experience anyway.  Implementing scale-out is a
different story - accomplishing that likely depends on the specific use
case, application(s) and data models involved with specific sharding
configurations on deployment.


> Thanks and Best Regards,
>
> Ryan Goulding
>
> On Fri, Jan 5, 2018 at 9:42 AM, James MacNider <james.macni...@amdocs.com>
> wrote:
>
>> Hi Ryan,
>>
>>
>>
>> I hope that it’s alright that I’m contacting you at your Gmail address,
>> it’s the only one I could find on the ONAP wiki.  I’d like to follow up on
>> the brief exchange we had at the ONAP meeting in Santa Clara where we
>> talked about the possibility of scaling an ODL cluster without
>> reconfiguring all the nodes and restarting them.  I’ve been unable to find
>> documentation that reflects this yet, but if I recall correctly, you knew
>> the ODL contributor(s) that worked on this feature set.  Could you help me
>> connect with them to learn more about this?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> *James MacNider*
>>
>> Software Architect
>>
>>
>>
>> Open Network Division
>>
>> Amdocs Technology
>>
>> (office) (613)-595-5213 <(613)%20595-5213>
>>
>> [image: amdocs-a]
>>
>> *Amdocs* is a Platinuim Member of ONAP <https://onap.org/>
>>
>>
>> This message and the information contained herein is proprietary and
>> confidential and subject to the Amdocs policy statement,
>> you may review at https://www.amdocs.com/about/email-disclaimer
>>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [Opendaylight-users] More Nitrogen weirdness

2017-12-18 Thread Tom Pantelis
On Mon, Dec 18, 2017 at 2:03 PM, Ryan Dietrich 
wrote:

> The archetype has issues - I pushed https://git.
> opendaylight.org/gerrit/#/c/66545/ to address the ones you encountered
> and others I found. coretutorials is an abortion - it was started with good
> intentions but not followed thru.
>
> AKAIK the toaster example in the controller project along with
> https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-
> SAL:Toaster_Step-By-Step are up-to-date and not broken.
>
>
> How do I re-test the change you just pushed (I assume a version bump of
> some sort?)
>

That patch is on master - you can clone master and then cherry-pick that
patch.


>
> -Ryan Dietrich
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [Opendaylight-users] More Nitrogen weirdness

2017-12-17 Thread Tom Pantelis
On Sun, Dec 17, 2017 at 10:15 PM, Tom Pantelis <tompante...@gmail.com>
wrote:

>
>
> On Sun, Dec 17, 2017 at 7:50 PM, Ryan Dietrich <r...@betterservers.com>
> wrote:
>
>> I made a screencast showing exactly what I am doing.  Please advise?
>>>
>>> https://asciinema.org/a/R5Oo9BYsPuQMBYfpjGI29D1fM
>>>
>>> I have never seen “ExampleProvider Session Initiated”, no matter what I
>>> do.  It feels like the most basic “hello world” program doesn’t even work
>>> with Nitrogen anymore :(
>>>
>>>
>> There's a couple reasons you don't see that message. First, none of the
>> example features are specified in the featuresBoot property
>> in etc/org.apache.karaf.features.cfg so none of the generated example
>> features are installed on startup - that's b/c this line is commented out
>> in karaf/pom.xml:
>>
>>
>> First, thanks for getting back to me on this.  I was pretty impressed
>> with asciinema, going to use it a lot more in the future (did you see you
>> can copy/paste from the video while it is playing!?  That is super cool!)
>>
>> I see the file: features.cfg.  In it I see this section that refers to
>> featuresBoot
>>
>> featuresBoot = \
>> standard, \
>> wrap
>>
>> Should it include something related to old-example?  How do I know what
>> to put there?
>>
>> 
>>
>> 
>>
>>
>> I’m guessing uncommenting this without fixing the features.cfg file isn’t
>> going to do anything?
>>
>>
So the karaf.localFeature property in the pom causes the
maven-deploy-plugin to include that feature in the featuresBoot line in the
generated distro. However it won't work due to the issue outlined below, ie
the odl-example-rest feature needs to be listed in the generated
features-example feature repo otherwise karaf wont know about it at
runtime. However it wont even get that far as the maven-deploy-plugin will
fail to generate the distro if it can't resolve the karaf.localFeature.


> That's not really an issue other than it no longer matches the wiki
>> content.
>>
>>
>> -Ryan Dietrich
>>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [Opendaylight-users] More Nitrogen weirdness

2017-12-17 Thread Tom Pantelis
On Sun, Dec 17, 2017 at 7:50 PM, Ryan Dietrich 
wrote:

> I made a screencast showing exactly what I am doing.  Please advise?
>>
>> https://asciinema.org/a/R5Oo9BYsPuQMBYfpjGI29D1fM
>>
>> I have never seen “ExampleProvider Session Initiated”, no matter what I
>> do.  It feels like the most basic “hello world” program doesn’t even work
>> with Nitrogen anymore :(
>>
>>
> There's a couple reasons you don't see that message. First, none of the
> example features are specified in the featuresBoot property
> in etc/org.apache.karaf.features.cfg so none of the generated example
> features are installed on startup - that's b/c this line is commented out
> in karaf/pom.xml:
>
>
> First, thanks for getting back to me on this.  I was pretty impressed with
> asciinema, going to use it a lot more in the future (did you see you can
> copy/paste from the video while it is playing!?  That is super cool!)
>
> I see the file: features.cfg.  In it I see this section that refers to
> featuresBoot
>
> featuresBoot = \
> standard, \
> wrap
>
> Should it include something related to old-example?  How do I know what to
> put there?
>
> 
>
> 
>
>
> I’m guessing uncommenting this without fixing the features.cfg file isn’t
> going to do anything?
>
> That's not really an issue other than it no longer matches the wiki
> content.
>
>
> I can follow a step-by-step checklist, I promise!  (missing steps makes it
> a bit harder though)
>
>  When you list the features, notice there's only odl-example-api -
> the odl-example feature which includes the example-impl bundle, which of
> course has the ExampleProvider class, is missing (and also the
> odl-example-cli and odl-example-rest features). This is b/c the
> features-example feature repo only includes odl-example-api - the
> features/features-example/pom.xml should list odl-example-rest, which
> pulls in all the example features, instead of odl-example-api as a
> dependency. Unfortunately, this was overlooked when the archetype
> was  migrated to karaf 4.
>
>
> Yeah, this encapsulation that is going on in the features-repo is
> confusing.  I still don’t get how OSGi connects the dots between
> “packaging” that announces itself as a “feature”, a “bundle” or a “pom”.
> The Karaf 4 manual might as well be written in sanskrit because I can’t
> make it through the first few paragraphs without stack-overflowing.
>
> So, is example totally broken? (similar to toaster and coretutorials)?  Is
> there a simple “hello world”-esque example anywhere in ODL that works with
> Nitrogen SR1?
>
>
The archetype has issues - I pushed
https://git.opendaylight.org/gerrit/#/c/66545/ to address the ones you
encountered and others I found. coretutorials is an abortion - it was
started with good intentions but not followed thru.

AKAIK the toaster example in the controller project along with
https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:Toaster_Step-By-Step
are up-to-date and not broken.



> -Ryan Dietrich
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [Opendaylight-users] More Nitrogen weirdness

2017-12-16 Thread Tom Pantelis
On Fri, Dec 15, 2017 at 5:27 PM, Ryan Dietrich 
wrote:

> I made a screencast showing exactly what I am doing.  Please advise?
>
> https://asciinema.org/a/R5Oo9BYsPuQMBYfpjGI29D1fM
>
> I have never seen “ExampleProvider Session Initiated”, no matter what I
> do.  It feels like the most basic “hello world” program doesn’t even work
> with Nitrogen anymore :(
>
>
There's a couple reasons you don't see that message. First, none of the
example features are specified in the featuresBoot property
in etc/org.apache.karaf.features.cfg so none of the generated example
features are installed on startup - that's b/c this line is commented out
in karaf/pom.xml:


   


That's not really an issue other than it no longer matches the wiki content.

 When you list the features, notice there's only odl-example-api -
the odl-example feature which includes the example-impl bundle, which of
course has the ExampleProvider class, is missing (and also the
odl-example-cli and odl-example-rest features). This is b/c the
features-example feature repo only includes odl-example-api - the
features/features-example/pom.xml should list odl-example-rest, which pulls
in all the example features, instead of odl-example-api as a dependency.
Unfortunately, this was overlooked when the archetype was  migrated to
karaf 4.
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] How about reducing WARN to DEBUG for ConflictingModificationAppliedException in ConcurrentDOMDataBroker?

2017-12-11 Thread Tom Pantelis
On Mon, Dec 11, 2017 at 5:28 PM, Michael Vorburger <vorbur...@redhat.com>
wrote:

> On Mon, Dec 11, 2017 at 8:36 PM, Tom Pantelis <tompante...@gmail.com>
> wrote:
>
>> On Mon, Dec 11, 2017 at 2:22 PM, Michael Vorburger <vorbur...@redhat.com>
>> wrote:
>>
>>> Controllers,
>>>
>>> looking at https://jira.opendaylight.org/browse/NETVIRT-916, half of
>>> the solution could be https://git.opendaylight.org/gerrit/#/c/66355/,
>>> and am wondering if the other half of the solution is in controller's
>>> ConcurrentDOMDataBroker:
>>>
>>> Is it right for it to LOG.warn a ConflictingModificationAppliedException?
>>> Shouldn't that be left to the caller? Given that a failed Future is
>>> returned, why log it from controller? Because people could just ignore the
>>> Future? IMHO we now have a solution for this via the @CheckReturnValue
>>> which error-prone (and maybe FindBugs, I'm not sure) can verify. Only for
>>> projects enforcing such tools, of course. And only if we
>>> add @CheckReturnValue to WriteTransaction submit() which I think we should
>>> - OK for everyone?
>>>
>>> Similarly for an OptimisticLockFailedException (not the case in
>>> NETVIRT-916, but just while we're at it) - that IMHO also should be be WARN
>>> logged by controller (if it currently is; dunno).
>>>
>>
>> Yeah I think that was put in case callers ignore the returned Future. I
>> agree it mostly adds a lot of extra noise - I'm fine with lowering it to
>> DEBUG.
>>
>
> OK, great; then I've just so proposed this in a few changes on
> https://git.opendaylight.org/gerrit/#/q/topic:CONTROLLER-1802
>
> What's may be still TBD is the equivalent of https://git.opendaylight.org/
> gerrit/#/c/66362/ in mdsal, or is there no such thing?
>

ConcurrentDOMDataBroker is CDS's broker implementation
(sal-distributed-datastore) and is the one used in production.
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] How about reducing WARN to DEBUG for ConflictingModificationAppliedException in ConcurrentDOMDataBroker?

2017-12-11 Thread Tom Pantelis
On Mon, Dec 11, 2017 at 2:22 PM, Michael Vorburger 
wrote:

> Controllers,
>
> looking at https://jira.opendaylight.org/browse/NETVIRT-916, half of the
> solution could be https://git.opendaylight.org/gerrit/#/c/66355/, and am
> wondering if the other half of the solution is in controller's
> ConcurrentDOMDataBroker:
>
> Is it right for it to LOG.warn a ConflictingModificationAppliedException?
> Shouldn't that be left to the caller? Given that a failed Future is
> returned, why log it from controller? Because people could just ignore the
> Future? IMHO we now have a solution for this via the @CheckReturnValue
> which error-prone (and maybe FindBugs, I'm not sure) can verify. Only for
> projects enforcing such tools, of course. And only if we
> add @CheckReturnValue to WriteTransaction submit() which I think we should
> - OK for everyone?
>
> Similarly for an OptimisticLockFailedException (not the case in
> NETVIRT-916, but just while we're at it) - that IMHO also should be be WARN
> logged by controller (if it currently is; dunno).
>

Yeah I think that was put in case callers ignore the returned Future. I
agree it mostly adds a lot of extra noise - I'm fine with lowering it to
DEBUG.


>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] How to get a unique ID for a Cluster Node in application code? (GENIUS-98)

2017-11-23 Thread Tom Pantelis
On Thu, Nov 23, 2017 at 10:44 AM, Michael Vorburger 
wrote:

> Hello,
>
> does anyone here have an idea / suggestion re. what would be the best API
> to use to obtain a unique ID for a Cluster Node in ODL?
>
> In https://jira.opendaylight.org/projects/GENIUS/issues/GENIUS-98 I had
> complained that the 
> "InetAddresses.coerceToInteger(InetAddress.getLocalHost())"
> used in a few places in genius and netvirt may not be ideal - but I don't
> really have a better alternative either... do you?
>

The role name configured in the akka.conf would be a good candidate. This
can be obtained via an akka API or via JMX.


>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Should application code persist do retries on TransactionCommitFailedException caused by AskTimeoutException or could CDS be configured to retry more?

2017-11-08 Thread Tom Pantelis
The new tell-based protocol in CDS adds internal retries for transactions.
It is not the default yet (https://git.opendaylight.org/gerrit/#/c/61002/).

On Wed, Nov 8, 2017 at 3:20 PM, Michael Vorburger 
wrote:

> Tom and other controllerians,
>
> While code reviewing https://git.opendaylight.org/gerrit/#/c/61526/ for
> https://jira.opendaylight.org/browse/GENIUS-86, I learnt that, apparently
> (quote) "in scale testing, there are too many writes and reads over the
> network, and sometimes these AskTimeout exceptions occur due to the load,
> it is just that for sometime we are not able to reach the other side, but
> the nodes are all healthy, and it comes back soon", and wanted to know:
>
> 1. is this still the case, or is that propose change to master for some
> known old problem that was meanwhile fixed in controller CDS infra?
>
> 2. does it seem right to you that application code handles this? Like
> wouldn't it be better if there was some configuration knob somewhere in
> controller CDS to increase whatever timeout or retry counter is behind when
> these TransactionCommitFailedException caused by 
> akka.pattern.AskTimeoutException
> occur, to tune it to try harder/longer, and not throw any
> TransactionCommitFailed?
>
> 3. when these do occur, is there really a "scenario where even though the
> transaction throws a TransactionCommitFailedException (caused by
> akka.pattern.AskTimeoutException) it eventually succeeds" ? That's what
> in c/61526 is being proposed to be added to the DataBrokerFailures test
> utility, to test such logic in application code... in
> DataBrokerFailuresImpl, it simulates a submit() that actually did go
> through and changed the DS (line 95 super.submit().get()) but then return
> immediateFailedCheckedFuture(submitException) anyway. Is that really what
> (under this scenario) could happen IRL at prod from CDS? That seems...
> weird, curious - so it's transactions are not really (always)
> transactionally to be trusted? ;)
>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL crashing in CSIT jobs

2017-10-30 Thread Tom Pantelis
On Mon, Oct 30, 2017 at 4:25 PM, Sam Hague <sha...@redhat.com> wrote:

>
>
> On Mon, Oct 30, 2017 at 3:02 PM, Tom Pantelis <tompante...@gmail.com>
> wrote:
>
>>
>>
>> On Mon, Oct 30, 2017 at 2:49 PM, Michael Vorburger <vorbur...@redhat.com>
>> wrote:
>>
>>> Hi Sam,
>>>
>>> On Mon, Oct 30, 2017 at 7:45 PM, Sam Hague <sha...@redhat.com> wrote:
>>>
>>>> Stephen, Michael, Tom,
>>>>
>>>> do you have any ways to collect debugs when ODL crashes in CSIT?
>>>>
>>>
>>> JVMs (almost) never "just crash" without a word... either some code
>>> does java.lang.System.exit(), which you may remember we do in the CDS/Akka
>>> code somewhere, or there's a bug in the JVM implementation - in which case
>>> there should be a one of those JVM crash logs type things - a file named
>>> something like hs_err_pid22607.log in the "current working" directory.
>>> Where would that be on these CSIT runs, and are the CSIT JJB jobs set up to
>>> preserve such JVM crash log files and copy them over to
>>> logs.opendaylight.org ?
>>>
>>
>> Akka will do System.exit() if it encounters an error serious for that.
>> But it doesn't do it silently. However I believe we disabled the automatic
>> exiting in akka.
>>
> Should there be any logs in ODL for this? There is nothing in the karaf
> log when this happens. It literally just stops.
>
> The karaf.console log does say the karaf process was killed:
>
> /tmp/karaf-0.7.1-SNAPSHOT/bin/karaf: line 422: 11528 Killed ${KARAF_EXEC}
> "${JAVA}" ${JAVA_OPTS} "$NON_BLOCKING_PRNG" 
> -Djava.endorsed.dirs="${JAVA_ENDORSED_DIRS}"
> -Djava.ext.dirs="${JAVA_EXT_DIRS}" -Dkaraf.instances="${KARAF_HOME}/instances"
> -Dkaraf.home="${KARAF_HOME}" -Dkaraf.base="${KARAF_BASE}"
> -Dkaraf.data="${KARAF_DATA}" -Dkaraf.etc="${KARAF_ETC}"
> -Dkaraf.restart.jvm.supported=true -Djava.io.tmpdir="${KARAF_DATA}/tmp"
> -Djava.util.logging.config.file="${KARAF_BASE}/etc/java.util.logging.properties"
> ${KARAF_SYSTEM_OPTS} ${KARAF_OPTS} ${OPTS} "$@" -classpath "${CLASSPATH}"
> ${MAIN}
>
> In the CSIT robot files we can see the below connection errors so ODL is
> not responding to new requests. This plus the above lead to think ODL just
> died.
>
> [ WARN ] Retrying (Retry(total=2, connect=None, read=None, redirect=None,
> status=None)) after connection broken by 'NewConnectionError('<
> requests.packages.urllib3.connection.HTTPConnection object at 0x5ca2d50>:
> Failed to establish a new connection: [Errno 111] Connection refused',)'
>
>>
>>
That would seem to indicate something did a kill -9.  As Michael said, if
the JVM crashed there would be an hs_err_pid file and it would log a
message about it.
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL crashing in CSIT jobs

2017-10-30 Thread Tom Pantelis
On Mon, Oct 30, 2017 at 2:49 PM, Michael Vorburger 
wrote:

> Hi Sam,
>
> On Mon, Oct 30, 2017 at 7:45 PM, Sam Hague  wrote:
>
>> Stephen, Michael, Tom,
>>
>> do you have any ways to collect debugs when ODL crashes in CSIT?
>>
>
> JVMs (almost) never "just crash" without a word... either some code
> does java.lang.System.exit(), which you may remember we do in the CDS/Akka
> code somewhere, or there's a bug in the JVM implementation - in which case
> there should be a one of those JVM crash logs type things - a file named
> something like hs_err_pid22607.log in the "current working" directory.
> Where would that be on these CSIT runs, and are the CSIT JJB jobs set up to
> preserve such JVM crash log files and copy them over to
> logs.opendaylight.org ?
>

Akka will do System.exit() if it encounters an error serious for that.  But
it doesn't do it silently. However I believe we disabled the automatic
exiting in akka.


>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
>
>
>>
>> We have a number of jobs [1] that have recently started to crash. ODL
>> just goes away in the middle of the job. No warnings or exceptions. This
>> seems to only happen with ntirogen and oxygen so it leads me to believe it
>> is a recent patch in something core.
>>
>> Thanks, Sam
>>
>> [1] https://logs.opendaylight.org/releng/jenkins092/netvirt-
>> csit-1node-openstack-ocata-upstream-stateful-nitrogen/319/od
>> l1_karaf.log.gz
>>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [unimgr-dev] SingleFeatureTest (SFT) failure on odl-integration-compatible-with-all due to ConflictingModificationAppliedException: Node was created by other transaction.

2017-10-26 Thread Tom Pantelis
On Thu, Oct 26, 2017 at 9:37 AM, Michael Vorburger <vorbur...@redhat.com>
wrote:

> Hi Donald,
>
> On Thu, Oct 26, 2017 at 3:02 PM, Donald Hunter (donaldh) <
> dona...@cisco.com> wrote:
>
>> Hi Michael,
>>
>>
>>
>> Please don’t just remove unimgr. That’s not helpful to unimgr. The first
>> I heard there was a problem relating to unimgr was yesterday when I read
>> your email after you forwarded it to the unimgr-dev list.
>>
>>
>>
>> What’s the problem that you think is caused by unimgr and I’ll see if we
>> can resolve it.
>>
>
> I've no idea actually - but only based on https://jira.opendaylight.org/
> browse/NETCONF-479, my understanding is that Tomas Cere and Vratko Polak
> understand this better.
>
> My main point really is that, whatever needs to be done in unimgr and/or
> netconf (which personally I unforatuntely don't have the cycles to help out
> more with), is not an excuse to keep failing distribution build jobs for
> many, many other projects... I'd there again like to propose to TEMPORARILY
> (!) remove unimgr from distribution, until this is solved - what's the harm?
>


I agree we need a workaround now. I believe SFT fails b/c some netconf
bundle fails on BP startup so perhaps this could also be alleviated by not
failing fast. I can push a patch if someone knows or could point me towards
the offending code.

>
>
>
>> Cheers,
>>
>> Donald.
>>
>>
>>
>> *From: *<unimgr-dev-boun...@lists.opendaylight.org> on behalf of Michael
>> Vorburger <vorbur...@redhat.com>
>> *Date: *Thursday, 26 October 2017 at 12:44
>> *To: *"Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco)" <
>> vrpo...@cisco.com>, "Tomas Cere -X (tcere - PANTHEON TECHNOLOGIES at
>> Cisco)" <tc...@cisco.com>
>> *Cc: *controller-dev <controller-dev@lists.opendaylight.org>, "
>> unimgr-...@lists.opendaylight.org" <unimgr-...@lists.opendaylight.org>,
>> Tom Pantelis <tompante...@gmail.com>
>> *Subject: *Re: [unimgr-dev] [controller-dev] SingleFeatureTest (SFT)
>> failure on odl-integration-compatible-with-all due to
>> ConflictingModificationAppliedException: Node was created by other
>> transaction.
>>
>>
>>
>> On Tue, Oct 24, 2017 at 11:44 PM, Michael Vorburger <vorbur...@redhat.com>
>> wrote:
>>
>> +unimgr-dev:
>>
>>
>>
>> On Mon, Oct 23, 2017 at 6:02 PM, Vratko Polak -X (vrpolak - PANTHEON
>> TECHNOLOGIES at Cisco) <vrpo...@cisco.com> wrote:
>>
>> Previous story: [2].
>>
>>
>>
>> It's not that rare - just hit me again (on https://git.opendaylight.o
>> rg/gerrit/#/c/64674/), had to override once more - and find this
>> annoying..
>>
>>
>>
>> > who would have to do what
>>
>>
>>
>> Ideally, Netconf developers would unify their features,
>>
>> which does not seem to get done anytime soon [3].
>>
>>
>> If I understand [3] correctly, Tomas Cere doesn't even consider this a
>> netconf issue, but asks for "unimgr should move towards
>> odl-netconf-topology" (instead of odl-netconf-connector-ssh, because "There
>> is no reason to pull in odl-netconf-connector-ssh unless you are using
>> config subsystem still"). Is this something the unimgr project would be
>> willing to do?
>>
>>
>>
>> If not, or if unimgr-dev, assuming I understand things correctly, why
>> don't we just kick unimgr project out of distribution?! I'll raise a patch
>> proposing this when it next hits me, if it's not resolved by then.
>>
>>
>>
>> this just happened AGAIN on https://jenkins.opendayligh
>> t.org/releng/job/genius-distribution-check-oxygen/481/console for
>> https://git.opendaylight.org/gerrit/#/c/60303/ ... it's a real PITA IMHO!
>>
>>
>>
>> I'd therefore like to suggest https://git.opendaylight.org/g
>> errit/#/c/64761/ - objections, anyone?
>>
>>
>>
>> There is a workaround in Int/Dist [4] prepared,
>>
>> but it keeps SFT unstable, this time due to
>>
>> (lack of) Karaf 4 memory efficiency [5].
>>
>>
>>
>> Current road to stability seem to be
>>
>> fixing various ODL project features (like [6])
>>
>> to be less taxing on Karaf 4 bundle resolver,
>>
>> and then merging [4].
>>
>>
>>
>> > It would be nice if the exception included some context like the path.
>>
>>
>>
>> I have rebased my old [7].
>>
>>

Re: [controller-dev] Expose Datastore health to applications via infrautils.diagstatus

2017-10-23 Thread Tom Pantelis
On Mon, Oct 23, 2017 at 2:34 PM, Robert Varga <n...@hq.sk> wrote:

> On 23/10/17 14:37, Tom Pantelis wrote:
> > Or we get infrautils promoted to "mature" to get around the red tape.
> > What would that take? ...
>
> A Graduation Review. Unfortunately
> https://www.opendaylight.org/project-lifecycle-releases disappeared
> somewhere (Casey, do you know where?), but it includes things like:
> - clear scope
> - history of following the mature release cycle
> etc.
>
>
If it disappeared then how important can it be now :) ...  These rules and
bureaucracy were put into place a while ago when ODL had a lot more
participation. Along with Michael, I question whether it's really relevant
anymore...


> Bye,
> Robert
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] SingleFeatureTest (SFT) failure on odl-integration-compatible-with-all due to ConflictingModificationAppliedException: Node was created by other transaction.

2017-10-23 Thread Tom Pantelis
On Mon, Oct 23, 2017 at 8:41 AM, Michael Vorburger 
wrote:

> Hello,
>
> Any idea who would have to do what to precent SFT from (only
> occassionally?!) failing on odl-integration-compatible-with-all due to
> ConflictingModificationAppliedException: Node was created by other
> transaction, as seen on https://logs.opendaylight.org/releng/jenkins092/
> infrautils-distribution-check-oxygen/144/console.log.gz for
> https://git.opendaylight.org/gerrit/#/c/63466/ ?
>

It would be nice if the exception included some context like the path. Does
this test install all netconf features? I've seen such sporadic issues with
the callhome feature when it's installed with all the other netconf
features.

How does SFT even pick this up to fail the test? Log scraping?  Or was it a
"caused by" thrown on blueprint startup?


>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Expose Datastore health to applications via infrautils.diagstatus

2017-10-23 Thread Tom Pantelis
On Mon, Oct 23, 2017 at 8:35 AM, Tom Pantelis <tompante...@gmail.com> wrote:

>
>
> On Mon, Oct 23, 2017 at 5:35 AM, Faseela K <faseel...@ericsson.com> wrote:
>
>> Hi all,
>>
>>
>>
>>Thanks for reviewing the patch, and giving comments.
>>
>>But there is a comment from Robert that this adds dependency of a
>> mature project to an incubation project J Would like to know whether it
>> is completely not possible. In that case, we have to find out other ways to
>> achieve this.
>>
>
> I don't really know the rules/philosophies/history with incubation
> projects and dependencies and what it takes or means to be "mature" (or if
> really matters anymore with ODL). However I don't think we shouldn't let
> bureaucracy impede progress so I'm fine with the dependency. We should be
> able to  freely use infrautils - prior to it we used yangtools as a kind of
> dumping ground for generic components (that had nothing to do with yang)
> b/c we had no where else to put them. infrautils *should* serve that
> purpose now. But if it's a showstopper then the proposed 
> DatastoreStatusMonitor
> could actually reside anywhere since it just uses JMX.
>
>


Or we get infrautils promoted to "mature" to get around the red tape. What
would that take? ...


>
>>
>> Thanks,
>>
>> Faseela
>>
>>
>>
>> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
>> *Sent:* Thursday, October 12, 2017 4:28 PM
>> *To:* Faseela K <faseel...@ericsson.com>
>> *Cc:* Anil Vishnoi <vishnoia...@gmail.com>; Muthukumaran K <
>> muthukumara...@ericsson.com>; infrautils-...@lists.opendaylight.org;
>> controller-dev@lists.opendaylight.org; R Srinivasan E <
>> r.e.sriniva...@ericsson.com>; Dayavanti Gopal Kamath <
>> dayavanti.gopal.kam...@ericsson.com>
>> *Subject:* Re: [controller-dev] Expose Datastore health to applications
>> via infrautils.diagstatus
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Oct 12, 2017 at 6:36 AM, Faseela K <faseel...@ericsson.com>
>> wrote:
>>
>>
>>
>>
>>
>> So here is how diagstatus module works – any application should register
>> as a “service” with the framework, report an initial status(using the APIs
>> provided by diagstatus).
>>
>> There is another OsgiService “ServiceStatusProvider” exposed, and if
>> applications implement the same, that will be called everytime an external
>> request is made to get the current service status.
>>
>> In looking at the API, it appears an app would register with the
>> DiagStatusService and invoke report each time its status changes. An app
>> can also register a ServiceStatusProvider to report its status when
>> queried. It seems this is an alternative to interacting with the
>> DiagStatusService in looking at the DiagStatusServiceImpl which always
>> calls updateServiceStatusMap to query the ServiceStatusProviders from the
>> get* methods. Given that, why would an app need to explicitly register and
>> push its status to the DiagStatusService? Why not just advertise a
>> ServiceStatusProvider? This seems simpler. In that case,
>> DiagStatusServiceImpl doesn't need to maintain the statusMap - it would
>> just query the ServiceStatusProvider(s) on demand. Or am I missing
>> something?
>>
>>
>>
>> For services like “DATASTORE” only the pull model is required, just
>> register the service and implement ServiceStatusProvider.
>>
>> There are some usecases in genius, where a push model was preferred, and
>> hence we have kept both the options open.
>>
>>
>>
>> OK.  By "just register the service" I assume you mean just advertise a 
>> ServiceStatusProvider
>> OSGi service. It is not necessary to explicitly register with the 
>> DiagStatusService
>> as that is implicit by advertising a ServiceStatusProvider.
>>
>>
>>
>> The code in DiagStatusServiceImpl does not enforce explicit registration
>> - one can just call report w/o a prior register call - not sure if that was
>> the original intent.  Similarly a ServiceStatusProvider's status is
>> reported even if it didn't explicitly call register.
>>
>>
>>
>> Right Tom, the original intent was to allow only services who do explicit
>> registration. But it is not enforced yet, wanted to get inputs on how the
>> apps would be interested to go about this. Michael recently modified the
>> implementation to allow deregistration only for those who act

Re: [controller-dev] Expose Datastore health to applications via infrautils.diagstatus

2017-10-23 Thread Tom Pantelis
On Mon, Oct 23, 2017 at 5:35 AM, Faseela K <faseel...@ericsson.com> wrote:

> Hi all,
>
>
>
>Thanks for reviewing the patch, and giving comments.
>
>But there is a comment from Robert that this adds dependency of a
> mature project to an incubation project J Would like to know whether it
> is completely not possible. In that case, we have to find out other ways to
> achieve this.
>

I don't really know the rules/philosophies/history with incubation projects
and dependencies and what it takes or means to be "mature" (or if really
matters anymore with ODL). However I don't think we shouldn't let
bureaucracy impede progress so I'm fine with the dependency. We should be
able to  freely use infrautils - prior to it we used yangtools as a kind of
dumping ground for generic components (that had nothing to do with yang)
b/c we had no where else to put them. infrautils *should* serve that
purpose now. But if it's a showstopper then the
proposed DatastoreStatusMonitor could actually reside anywhere since it
just uses JMX.


>
>
> Thanks,
>
> Faseela
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Thursday, October 12, 2017 4:28 PM
> *To:* Faseela K <faseel...@ericsson.com>
> *Cc:* Anil Vishnoi <vishnoia...@gmail.com>; Muthukumaran K <
> muthukumara...@ericsson.com>; infrautils-...@lists.opendaylight.org;
> controller-dev@lists.opendaylight.org; R Srinivasan E <
> r.e.sriniva...@ericsson.com>; Dayavanti Gopal Kamath <
> dayavanti.gopal.kam...@ericsson.com>
> *Subject:* Re: [controller-dev] Expose Datastore health to applications
> via infrautils.diagstatus
>
>
>
>
>
>
>
> On Thu, Oct 12, 2017 at 6:36 AM, Faseela K <faseel...@ericsson.com> wrote:
>
>
>
>
>
> So here is how diagstatus module works – any application should register
> as a “service” with the framework, report an initial status(using the APIs
> provided by diagstatus).
>
> There is another OsgiService “ServiceStatusProvider” exposed, and if
> applications implement the same, that will be called everytime an external
> request is made to get the current service status.
>
> In looking at the API, it appears an app would register with the
> DiagStatusService and invoke report each time its status changes. An app
> can also register a ServiceStatusProvider to report its status when
> queried. It seems this is an alternative to interacting with the
> DiagStatusService in looking at the DiagStatusServiceImpl which always
> calls updateServiceStatusMap to query the ServiceStatusProviders from the
> get* methods. Given that, why would an app need to explicitly register and
> push its status to the DiagStatusService? Why not just advertise a
> ServiceStatusProvider? This seems simpler. In that case,
> DiagStatusServiceImpl doesn't need to maintain the statusMap - it would
> just query the ServiceStatusProvider(s) on demand. Or am I missing
> something?
>
>
>
> For services like “DATASTORE” only the pull model is required, just
> register the service and implement ServiceStatusProvider.
>
> There are some usecases in genius, where a push model was preferred, and
> hence we have kept both the options open.
>
>
>
> OK.  By "just register the service" I assume you mean just advertise a 
> ServiceStatusProvider
> OSGi service. It is not necessary to explicitly register with the 
> DiagStatusService
> as that is implicit by advertising a ServiceStatusProvider.
>
>
>
> The code in DiagStatusServiceImpl does not enforce explicit registration -
> one can just call report w/o a prior register call - not sure if that was
> the original intent.  Similarly a ServiceStatusProvider's status is
> reported even if it didn't explicitly call register.
>
>
>
> Right Tom, the original intent was to allow only services who do explicit
> registration. But it is not enforced yet, wanted to get inputs on how the
> apps would be interested to go about this. Michael recently modified the
> implementation to allow deregistration only for those who actually
> registered. We were thinking on enforcing the same everywhere, but just
> thought of sharing the idea to apps before doing the same.
>
>
>
> It seems the only reason for explicit registration would be to remove it
> from being reported on unregistration. But this could also be effected by
> reporting that as a STOPPED status, which might be useful to report. In any
> event, explicit reg/unreg via the DiagStatusService  API would only be
> needed/enforced when pushing status.  Advertising a ServiceStatusProvider
> OSGi service is an implicit registration and removal of the OSGi service is
> an implicit unregistration.
>
>
>
>
>
> Thanks,
>
> Faseela
>
>
>
>
>
>
>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] OpenDaylight Issue: CONTROLLER-1770 (Bugzilla 9163)

2017-10-20 Thread Tom Pantelis
On Fri, Oct 20, 2017 at 10:31 AM, Kit Lou  wrote:

> Hello Guan,
>
> You submitted this issue (Bugzilla [1], JIRA [2]) a month ago and
> indicated you had a patch ready for review.
>
> Could you please provide a link to the patch?  Your contribution is
> appreciated.
>
> In you opinion,  is this a blocker bug (as we are unsure why this bug got
> elevated 2 days ago from normal severity to blocker)?
>
>
It was An Ho that elevated it - we should be asking him.


>
> Best Regards,
> Kit Lou
>
> [1] https://docs.google.com/sp
> 
> https://bugs.opendaylight.org/show_bug.cgi?id=9163
> 
>
> [2] https://docs.google.com/sp
> 
> https://jira.opendaylight.org/browse/CONTROLLER-1770
>
> ___
> release mailing list
> rele...@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/release
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [OpenDaylight TSC] [release] Nitrogen SR1 Blocker Bug: CONTROLLER-1770

2017-10-19 Thread Tom Pantelis
On Thu, Oct 19, 2017 at 3:05 PM, Jamo Luhrsen <jluhr...@gmail.com> wrote:

> I asked in Jira for them to provide a link to their patch. If they have a
> patch
> and it's ok to merge after controller committers review it, we can just
> resolve
> the bug and it will be there for nitro SR1.
>
> if no patch comes, I think the evidence so far suggests we can downgrade
> it from
> blocker and release SR1 without a fix.
>
>
>
I already downgraded it to Medium in JIRA. The question is why did An all
of a sudden elevate it to Blocker (and after the switch-over) with no
reason... As I said, seems like a mistake somehow. In any event if the
creator pushes their patch soon we can merge it but they said they were
going to push it over a month ago :)


> JamO
>
>
> On 10/19/2017 11:49 AM, Tom Pantelis wrote:
> >
> > On Thu, Oct 19, 2017 at 2:45 PM, Ryan Goulding <ryandgould...@gmail.com
> <mailto:ryandgould...@gmail.com>> wrote:
> >
> > If I'm not mistaken, from the history it would appear An Ho was the
> one to escalate the severity to "Blocker" in Bugzilla
> > [0].  An can you comment on this to clarify why it was elevated?
> >
> >
> > yeah and it was done yesterday after the switch-over I think. Seems odd
> - seems like a mistake.
> >
> >
> >
> > Thanks!
> >
> > Regards,
> >
> > Ryan Goulding
> >
> > [0] https://bugs.opendaylight.org/show_activity.cgi?id=9163 <
> https://bugs.opendaylight.org/show_activity.cgi?id=9163>
> >
> > On Thu, Oct 19, 2017 at 2:39 PM, Tom Pantelis <tompante...@gmail.com
> <mailto:tompante...@gmail.com>> wrote:
> >
> >
> > On Thu, Oct 19, 2017 at 2:29 PM, Kit Lou <
> klou.exter...@gmail.com <mailto:klou.exter...@gmail.com>> wrote:
> >
> > Good question Tom!  Thanks for your feedback!  What is the
> likelihood of someone encountering the issue?  Is
> > there a workaround?
> >
> > This issue was just elevated to "Blocker" yesterday in
> between the JIRA migration.  It was at "Normal" severity
> > in JIRA and "Blocker" in bugzilla.  I had to manually adjust
> the severity in JIRA to match bugzilla this morning.
> >
> >
> > I don't recall it ever being a blocker in bugzilla - it wasn't a
> couple days ago when I last looked at the clustering
> > bugs (there haven't been any blockers in a while). I don't know
> how it all of a sudden got elevated to "Blocker". If
> > you want it to be a blocker then that's fine but, as I
> mentioned, it's an edge case and has been there all along. The
> > person that created it a while ago stated they had a patch but
> never pushed it.
> >
> >
> >
> > TSC Members,
> >
> > Should we hold nitrogen SR1 for this issue?
> >
> > Best Regards,
> > Kit
> >
> >
> > On Thu, Oct 19, 2017 at 1:18 PM, Tom Pantelis <
> tompante...@gmail.com <mailto:tompante...@gmail.com>> wrote:
> >
> >
> >
> > On Thu, Oct 19, 2017 at 2:01 PM, Kit Lou <
> klou.exter...@gmail.com <mailto:klou.exter...@gmail.com>> wrote:
> >
> > Hi Controller Team,
> >
> > This email is to inform you that there is a blocker
> bug CONTROLLER-1770 in JIRA [1] against the
> > controller project blocking the upcoming nitrogen
> SR1 release.
> >
> > Please be ready to help with resolving the issue.
> Thanks!
> >
> >
> > Why is this a blocker for SR1 when it wasn't a blocker
> for nitrogen?  Plus this is an edge case that has been
> > there all along.
> >
> >
> >
> > Best Regards,
> > Kit Lou
> >
> > [1] https://jira.opendaylight.org/
> browse/CONTROLLER-1770 <https://jira.opendaylight.
> org/browse/CONTROLLER-1770>
> >
> >
> > ___
> > release mailing list
> > rele...@lists.opendaylight.org  rele...@lists.opendaylight.org>
> > https://lists.opendaylight.
> org/mailman/listinfo/release
> > <https://lists.opendaylight.
> org/mailman/listinfo/release>
> >
> >
> >
> >
> >
> > ___
> > TSC mailing list
> > t...@lists.opendaylight.org <mailto:t...@lists.opendaylight.org>
> > https://lists.opendaylight.org/mailman/listinfo/tsc <
> https://lists.opendaylight.org/mailman/listinfo/tsc>
> >
> >
> >
> >
> >
> > ___
> > controller-dev mailing list
> > controller-dev@lists.opendaylight.org
> > https://lists.opendaylight.org/mailman/listinfo/controller-dev
> >
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] Nitrogen SR1 Blocker Bug: CONTROLLER-1770

2017-10-19 Thread Tom Pantelis
On Thu, Oct 19, 2017 at 2:29 PM, Kit Lou <klou.exter...@gmail.com> wrote:

> Good question Tom!  Thanks for your feedback!  What is the likelihood of
> someone encountering the issue?  Is there a workaround?
>
> This issue was just elevated to "Blocker" yesterday in between the JIRA
> migration.  It was at "Normal" severity in JIRA and "Blocker" in bugzilla.
> I had to manually adjust the severity in JIRA to match bugzilla this
> morning.
>

I don't recall it ever being a blocker in bugzilla - it wasn't a couple
days ago when I last looked at the clustering bugs (there haven't been any
blockers in a while). I don't know how it all of a sudden got elevated to
"Blocker". If you want it to be a blocker then that's fine but, as I
mentioned, it's an edge case and has been there all along. The person that
created it a while ago stated they had a patch but never pushed it.


>
> TSC Members,
>
> Should we hold nitrogen SR1 for this issue?
>
> Best Regards,
> Kit
>
>
> On Thu, Oct 19, 2017 at 1:18 PM, Tom Pantelis <tompante...@gmail.com>
> wrote:
>
>>
>>
>> On Thu, Oct 19, 2017 at 2:01 PM, Kit Lou <klou.exter...@gmail.com> wrote:
>>
>>> Hi Controller Team,
>>>
>>> This email is to inform you that there is a blocker bug CONTROLLER-1770
>>> in JIRA [1] against the controller project blocking the upcoming nitrogen
>>> SR1 release.
>>>
>>> Please be ready to help with resolving the issue.  Thanks!
>>>
>>
>> Why is this a blocker for SR1 when it wasn't a blocker for nitrogen?
>> Plus this is an edge case that has been there all along.
>>
>>
>>>
>>> Best Regards,
>>> Kit Lou
>>>
>>> [1] https://jira.opendaylight.org/browse/CONTROLLER-1770
>>>
>>>
>>> ___
>>> release mailing list
>>> rele...@lists.opendaylight.org
>>> https://lists.opendaylight.org/mailman/listinfo/release
>>>
>>>
>>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Best way to gracefully shutdown Karaf in ODL context

2017-10-15 Thread Tom Pantelis
On Sun, Oct 15, 2017 at 8:47 AM, Muthukumaran K <muthukumara...@ericsson.com
> wrote:

> Hi Tom,
>
>
>
> So, we should still be doing the bundle 0 stop for quarantine case ?  I
> presume so because this expectation is from Akka  – is that right ?
>
>
>

Akka doesn't know anything about karaf/bundles - the app just needs to
restart the actor system once it's quarantined. For ODL that also means
restarting all the components that use the actor system which is easiest by
just restarting the karaf container which is accomplished by restarting the
framework bundle (0). However the refactoring by that patch somehow omitted
passing '0' which means it just stops the enclosing bundle and consequently
the actor system w/o restarting anything.


> >>> If you want to push a patch to fix it, I'll merge it.
>
> Sure Tom. Will do a local quarantine test with change and push the same
>
>
>
> Regards
>
> Muthu
>
>
>
>
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Friday, October 13, 2017 6:40 PM
> *To:* Muthukumaran K
> *Cc:* Daniel Farrell; Jamo Luhrsen; controller-dev@lists.opendaylight.org;
> integration-...@lists.opendaylight.org
>
> *Subject:* Re: [controller-dev] Best way to gracefully shutdown Karaf in
> ODL context
>
>
>
>
>
>
>
> On Fri, Oct 13, 2017 at 8:57 AM, Tom Pantelis <tompante...@gmail.com>
> wrote:
>
>
>
>
>
> On Fri, Oct 13, 2017 at 12:59 AM, Muthukumaran K <
> muthukumara...@ericsson.com> wrote:
>
> Thanks a lot for the pointers Daniel and JamO.
>
>
>
> https://git.opendaylight.org/gerrit/gitweb?p=releng/
> builder.git;a=blob;f=jjb/packaging/stop-odl.sh;h=
> 2e3e7bf15dfbe6e59bddfbfd4ce4805fb47b2a69;hb=refs/heads/master#l27 which
> aligns with my thought too .. J
>
>
>
> Just a clarification .. had there been any situation which you could
> recollect where the karaf PID lingered abnormally long (beyond 10 – 15
> mins) during stop phase ? Have seen this once using vanilla distro  but was
> never able to repro the same for past 1 month or so even after several day
> 2 day restarts. May it was an env issue locally. So, I was a bit reserved
> in rolling the approach of stop followed by waiting till PID vanishes into
> production
>
>
>
> @Tom, @Robert,
>
>
>
> Not directly related but I will fire away …
>
>
>
> Erstwhile https://github.com/opendaylight/controller/blob/
> master/opendaylight/md-sal/sal-clustering-commons/src/
> main/java/org/opendaylight/controller/cluster/common/
> actor/QuarantinedMonitorActor.java used to restart the entire container
> and now on master Quarantined state just restarts the ActorSystem – is my
> understanding right ?
>
>
>
> It restarts the enclosing bundle:
>
>
>
> return QuarantinedMonitorActor.props(() -> {
>
> // restart the entire karaf container
>
> LOG.warn("Restarting karaf container");
>
> System.setProperty("karaf.restart.jvm", "true");
>
> bundleContext.getBundle().stop();
>
> });
>
>
>
> It used to restart bundle 0. Not sure why that was changed
>
>
>
> Looks like this was inadvertently changed by https://git.opendaylight.org/
> gerrit/#/c/62451/ - it used to be
>
>  bundleContext.getBundle(0).stop();
>
>
>
> If you want to push a patch to fix it, I'll merge it.
>
>
>
>
>
>
>
> Regards
>
> Muthu
>
>
>
>
>
>
>
> *From:* Daniel Farrell [mailto:dfarr...@redhat.com]
> *Sent:* Friday, October 13, 2017 6:19 AM
> *To:* Jamo Luhrsen; Muthukumaran K; controller-dev@lists.opendaylight.org;
> integration-...@lists.opendaylight.org
> *Subject:* Re: [controller-dev] Best way to gracefully shutdown Karaf in
> ODL context
>
>
>
> Hey Muthu,
>
>
>
> Yes, I think you should take a look at the systemd configuration we ship
> in ODL's packages. As far as I know it does a good job of
> starting/stopping/restarting ODL's service.
>
>
>
> https://git.opendaylight.org/gerrit/gitweb?p=integration/
> packaging.git;a=blob;f=packages/rpm/unitfiles/opendaylight.service;h=
> ac436592d2880047986b856c7dd6810665ba0d3e;hb=refs/heads/master
>
>
>
> Here's a Nitrogen RPM that contains that systemd config:
>
>
>
> http://cbs.centos.org/repos/nfv7-opendaylight-70-release/
> x86_64/os/Packages/opendaylight-7.0.0-1.el7.noarch.rpm
>
>
>
> This test job shows examples of `sudo systemctl [start, stop, status]`
> working:
>
>
>
> https://jenkins.opendaylight.org/releng/job/packaging-test-rpm-master
>
>
>
> The logic for tha

Re: [controller-dev] Expose Datastore health to applications via infrautils.diagstatus

2017-10-12 Thread Tom Pantelis
On Thu, Oct 12, 2017 at 6:10 AM, Tom Pantelis <tompante...@gmail.com> wrote:

>
>
> On Thu, Oct 12, 2017 at 6:05 AM, Faseela K <faseel...@ericsson.com> wrote:
>
>>
>>
>>
>>
>> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
>> *Sent:* Thursday, October 12, 2017 3:23 PM
>> *To:* Faseela K <faseel...@ericsson.com>
>> *Cc:* Anil Vishnoi <vishnoia...@gmail.com>; Muthukumaran K <
>> muthukumara...@ericsson.com>; infrautils-...@lists.opendaylight.org;
>> controller-dev@lists.opendaylight.org; R Srinivasan E <
>> r.e.sriniva...@ericsson.com>; Dayavanti Gopal Kamath <
>> dayavanti.gopal.kam...@ericsson.com>
>> *Subject:* Re: [controller-dev] Expose Datastore health to applications
>> via infrautils.diagstatus
>>
>>
>>
>>
>>
>> So here is how diagstatus module works – any application should register
>> as a “service” with the framework, report an initial status(using the APIs
>> provided by diagstatus).
>>
>> There is another OsgiService “ServiceStatusProvider” exposed, and if
>> applications implement the same, that will be called everytime an external
>> request is made to get the current service status.
>>
>> In looking at the API, it appears an app would register with the
>> DiagStatusService and invoke report each time its status changes. An app
>> can also register a ServiceStatusProvider to report its status when
>> queried. It seems this is an alternative to interacting with the
>> DiagStatusService in looking at the DiagStatusServiceImpl which always
>> calls updateServiceStatusMap to query the ServiceStatusProviders from the
>> get* methods. Given that, why would an app need to explicitly register and
>> push its status to the DiagStatusService? Why not just advertise a
>> ServiceStatusProvider? This seems simpler. In that case,
>> DiagStatusServiceImpl doesn't need to maintain the statusMap - it would
>> just query the ServiceStatusProvider(s) on demand. Or am I missing
>> something?
>>
>>
>>
>> For services like “DATASTORE” only the pull model is required, just
>> register the service and implement ServiceStatusProvider.
>>
>> There are some usecases in genius, where a push model was preferred, and
>> hence we have kept both the options open.
>>
>
> OK.  By "just register the service" I assume you mean just advertise a 
> ServiceStatusProvider
> OSGi service. It is not necessary to explicitly register with the 
> DiagStatusService
> as that is implicit by advertising a ServiceStatusProvider.
>

The code in DiagStatusServiceImpl does not enforce explicit registration -
one can just call report w/o a prior register call - not sure if that was
the original intent.  Similarly a ServiceStatusProvider's status is
reported even if it didn't explicitly call register.

>
>>
>> Thanks,
>>
>> Faseela
>>
>>
>>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Expose Datastore health to applications via infrautils.diagstatus

2017-10-12 Thread Tom Pantelis
On Thu, Oct 12, 2017 at 6:05 AM, Faseela K <faseel...@ericsson.com> wrote:

>
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Thursday, October 12, 2017 3:23 PM
> *To:* Faseela K <faseel...@ericsson.com>
> *Cc:* Anil Vishnoi <vishnoia...@gmail.com>; Muthukumaran K <
> muthukumara...@ericsson.com>; infrautils-...@lists.opendaylight.org;
> controller-dev@lists.opendaylight.org; R Srinivasan E <
> r.e.sriniva...@ericsson.com>; Dayavanti Gopal Kamath <
> dayavanti.gopal.kam...@ericsson.com>
> *Subject:* Re: [controller-dev] Expose Datastore health to applications
> via infrautils.diagstatus
>
>
>
>
>
> So here is how diagstatus module works – any application should register
> as a “service” with the framework, report an initial status(using the APIs
> provided by diagstatus).
>
> There is another OsgiService “ServiceStatusProvider” exposed, and if
> applications implement the same, that will be called everytime an external
> request is made to get the current service status.
>
> In looking at the API, it appears an app would register with the
> DiagStatusService and invoke report each time its status changes. An app
> can also register a ServiceStatusProvider to report its status when
> queried. It seems this is an alternative to interacting with the
> DiagStatusService in looking at the DiagStatusServiceImpl which always
> calls updateServiceStatusMap to query the ServiceStatusProviders from the
> get* methods. Given that, why would an app need to explicitly register and
> push its status to the DiagStatusService? Why not just advertise a
> ServiceStatusProvider? This seems simpler. In that case,
> DiagStatusServiceImpl doesn't need to maintain the statusMap - it would
> just query the ServiceStatusProvider(s) on demand. Or am I missing
> something?
>
>
>
> For services like “DATASTORE” only the pull model is required, just
> register the service and implement ServiceStatusProvider.
>
> There are some usecases in genius, where a push model was preferred, and
> hence we have kept both the options open.
>

OK.  By "just register the service" I assume you mean just advertise a
ServiceStatusProvider
OSGi service. It is not necessary to explicitly register with the
DiagStatusService
as that is implicit by advertising a ServiceStatusProvider.

>
>
> Thanks,
>
> Faseela
>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Expose Datastore health to applications via infrautils.diagstatus

2017-10-12 Thread Tom Pantelis
>
> So here is how diagstatus module works – any application should register
> as a “service” with the framework, report an initial status(using the APIs
> provided by diagstatus).
>
> There is another OsgiService “ServiceStatusProvider” exposed, and if
> applications implement the same, that will be called everytime an external
> request is made to get the current service status.
>
> In looking at the API, it appears an app would register with the
DiagStatusService and invoke report each time its status changes. An app
can also register a ServiceStatusProvider to report its status when
queried. It seems this is an alternative to interacting with the
DiagStatusService in looking at the DiagStatusServiceImpl which always
calls updateServiceStatusMap to query the ServiceStatusProviders from the
get* methods. Given that, why would an app need to explicitly register and
push its status to the DiagStatusService? Why not just advertise a
ServiceStatusProvider? This seems simpler. In that case,
DiagStatusServiceImpl doesn't need to maintain the statusMap - it would
just query the ServiceStatusProvider(s) on demand. Or am I missing
something?


>
>
> Thanks,
>
> Faseela
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Expose Datastore health to applications via infrautils.diagstatus

2017-10-12 Thread Tom Pantelis
On Thu, Oct 12, 2017 at 3:08 AM, Muthukumaran K <muthukumara...@ericsson.com
> wrote:

> Hi Tom,
>
>
>
> While the initial status of the CDS is inferable using the aggregate
> SyncStatus, for dynamic status (eg. after startup, leader mobility in
> cluster due to load, availability scenarios like node-loss etc.), we were
> thinking of explicitly checking if all configured shards do have the leader
> or not (of course using the Shard Level MBeans).
>
>
>
> But, from your mail, I understand that aggregate SyncStatus being set to
> false can be a more easier way to address dynamic changes post start
> instead of doing shardwise checking.
>
>
>
> Is my understanding correct ?
>
>
>

That is correct. The shard will report a sync status change if it's a
follower and the leader changes or if it goes to candidate. Of course if
it's the leader, its sync status is automatically true. Also a follower
shard will report it's not in sync if it lags behind the leader by a
certain # of commits (default 10).


> Regards
>
> Muthu
>
>
>
>
>
> *From:* controller-dev-boun...@lists.opendaylight.org [mailto:
> controller-dev-boun...@lists.opendaylight.org] *On Behalf Of *Tom Pantelis
> *Sent:* Thursday, October 12, 2017 12:28 PM
> *To:* Faseela K
> *Cc:* infrautils-...@lists.opendaylight.org; controller-dev@lists.
> opendaylight.org; R Srinivasan E; Dayavanti Gopal Kamath
> *Subject:* Re: [controller-dev] Expose Datastore health to applications
> via infrautils.diagstatus
>
>
>
>
>
>
>
> On Wed, Oct 11, 2017 at 2:16 PM, Faseela K <faseel...@ericsson.com> wrote:
>
> Hello controller-dev,
>
>
>
>We @ infrautils have developed a status-and-diagnostics framework,
> where applications can register their services,
>
>And report when they are functionally up. Northbound and Southbound
> interfaces for ODL can open-up and accept configurations,
>
>When all the required services are UP. As part of this, we were
> thinking if we can have a “DATASTORE” service, whose status can
>
>Be shown as “OPERATIONAL” when all the shards have properly elected
> their leaders. We do see that there are several MBeans  exposed by
> controller repo under *org.opendaylight.controller:Category=Shards,name="*
> +**+*",type=DistributedConfigDatastore*
>
>   which can be used to derive the same information.
>
>Instead of doing that from outside, wanted to explore the possibility
> of integrating controller.sal-distributed-datastore with
> infrautils.diagstatus to report the status when the initial shard leader
> election is complete,
>
>And implement the dynamic poll interface to fetch the shard leader
> status at random points in time. Please share your thoughts.
>
>
>
> This sounds like a reasonable idea.  CDS does have an aggregated shard
> sync status that is collected and reported by the ShardManager to
> the ShardManagerInfo MBean's SyncStatus attribute for each data store (eg
> *type=DistributedConfigDatastore,Category=ShardManager,name=shard-manager-config*).
> Once all shards report that they are "in sync" (ie a leader is elected and,
> if it's a follower, its journal is up-to-date with the leader),  the
> ShardManager sets the aggregate SyncStatus to true. Subsequently, if a
> shard loses its leader, the aggregate SyncStatus will be set to false.
>
>
>
> I'm not really familiar with infrautils.diagstatus to know how exactly how
> this status would be reported to that component. This would also require
> the controller project to be dependent on infrautils - not sure if that
> would be OK?
>
>
>
> Also, separate from SyncStatus, CDS blocks its blueprint startup until all
> shards have elected a leader  (up to 90 sec) so its OSGi services aren't
> advertised until then. Therefore all bundles that import those services
> will also be blocked on startup.
>
>
>
>
>
>
>
> Thanks,
>
> Faseela
>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Expose Datastore health to applications via infrautils.diagstatus

2017-10-12 Thread Tom Pantelis
On Wed, Oct 11, 2017 at 2:16 PM, Faseela K  wrote:

> Hello controller-dev,
>
>
>
>We @ infrautils have developed a status-and-diagnostics framework,
> where applications can register their services,
>
>And report when they are functionally up. Northbound and Southbound
> interfaces for ODL can open-up and accept configurations,
>
>When all the required services are UP. As part of this, we were
> thinking if we can have a “DATASTORE” service, whose status can
>
>Be shown as “OPERATIONAL” when all the shards have properly elected
> their leaders. We do see that there are several MBeans  exposed by
> controller repo under *org.opendaylight.controller:Category=Shards,name="*
> +**+*",type=DistributedConfigDatastore*
>
>   which can be used to derive the same information.
>
>Instead of doing that from outside, wanted to explore the possibility
> of integrating controller.sal-distributed-datastore with
> infrautils.diagstatus to report the status when the initial shard leader
> election is complete,
>
>And implement the dynamic poll interface to fetch the shard leader
> status at random points in time. Please share your thoughts.
>

This sounds like a reasonable idea.  CDS does have an aggregated shard sync
status that is collected and reported by the ShardManager to
the ShardManagerInfo MBean's SyncStatus attribute for each data store (eg
*type=DistributedConfigDatastore,Category=ShardManager,name=shard-manager-config*).
Once all shards report that they are "in sync" (ie a leader is elected and,
if it's a follower, its journal is up-to-date with the leader),  the
ShardManager sets the aggregate SyncStatus to true. Subsequently, if a
shard loses its leader, the aggregate SyncStatus will be set to false.

I'm not really familiar with infrautils.diagstatus to know how exactly how
this status would be reported to that component. This would also require
the controller project to be dependent on infrautils - not sure if that
would be OK?

Also, separate from SyncStatus, CDS blocks its blueprint startup until all
shards have elected a leader  (up to 90 sec) so its OSGi services aren't
advertised until then. Therefore all bundles that import those services
will also be blocked on startup.



>
>
> Thanks,
>
> Faseela
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Cluster issues debugging instructions

2017-09-26 Thread Tom Pantelis
On Tue, Sep 26, 2017 at 6:26 PM, Atul Gosain 
wrote:

> Controller devs
>
> We have been facing issues while trying scaled scenarios on bgp and
> netconf projects when the setup was clustered. We had tried some settings
> to improve the replication performance like using tell-based mode enabled
> and then tried to tinker with few other factory/akka.conf settings like
> akka.cluster.failure.detector.acceptable-heartbeat-pause. These measures
> improved performance a bit since the tolerance for heartbeat detection
> failure was increased.
>
> We also enabled the artery mode to and did the corresponding setting
> changes in akka.conf to enable it. After some other setting changes, we
> could avoid the OOM issues that we faced initially. But inspite of these,
> we were not able to successfully run the cluster on artery mode.
>
> We may be able to troubleshoot it further if we are able to find some more
> detailed logs from akka.
>
> Are there any
> 1. Log settings we can enable to get the debug/trace level logs from akka ?
>

You can set akka.loglevel=DEBUG in the akka.conf and also
http://doc.akka.io/docs/akka/2.4.0/scala/logging.html#Auxiliary_remote_logging_options


> 2. Any other akka settings which we can tune to choose an acceptable
> tradeoff ?
>
> We have been using master for these tests to bring in the patches to
> improve cluster performance (mainly from TomP and Robert).
>
> --
>
> Thanks
> Atul
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] A question regarding binding independent components.

2017-09-15 Thread Tom Pantelis
On Fri, Sep 15, 2017 at 11:57 AM, Xingjun Chu 
wrote:

> Hi Tom and Controller team,
>
>
>
> I was reading the following wiki about the MDSAL, basically it talks about
> how Binding Independent Components work with each other.  It states that
> all consumers and providers have to be registered before talking to each
> other. I don’t recall I did that when developing FaaS module.  Is this
> something obsolete or it ‘s been hidden by MDSAL and all modules default
> registered as provider and consumer.?
>
>
>
> https://wiki.opendaylight.org/view/OpenDaylight_Controller:
> Binding-Independent_Components
>
>
>
>
>

That document looks pretty old. Are you referring to the Provider and
Consumer Registration section? That mechanism of using the Broker to
register providers/consumers is legacy in lieu of blueprint.  With my
recent patch, faas is all converted to blueprint now.


> thanks
>
> Xingjun
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] yang-jmx-generator AbstractYangTest.loadYangFiles:56->AbstractYangTest.getConfigApiYangInputStreams:81->AbstractYangTest.getStreams:101 /META-INF/yang/config.yang is null

2017-09-14 Thread Tom Pantelis
On Thu, Sep 14, 2017 at 12:54 PM, Robert Varga <n...@hq.sk> wrote:

> On 14/09/17 17:25, Robert Varga wrote:
> >
> > On 14/09/17 16:31, Tom Pantelis wrote:
> >>
> >> On Thu, Sep 14, 2017 at 10:30 AM, Tom Pantelis <tompante...@gmail.com
> >> <mailto:tompante...@gmail.com>> wrote:
> >>
> >> I just refreshed my local yangtools and controller and see the
> >> errors. Under config-api/target/classes/META-INF/yang/ the
> generated
> >> file is not con...@2013-04-05.yang. Something changed in
> >> yangtools  I suppose we need to change the tests to look for
> >> con...@2013-04-05.yang
> >>
> >>
> >> I meant "the generated file is NOW con...@2013-04-05.yang"
> > Yea, a slight adjustment is needed in the BasicCodeGenerator
> > implementation. Give me 10 mintues :)
>
> Ah, actually it's just a simple thing ...
> https://git.opendaylight.org/gerrit/63147 should do the trick.
>
>
yeah that's what I thought originally but was wondering if it wasn't
expected to have to append the revision. I hope this doesn't break other
areas as well...


> Bye,
> Robert
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] yang-jmx-generator AbstractYangTest.loadYangFiles:56->AbstractYangTest.getConfigApiYangInputStreams:81->AbstractYangTest.getStreams:101 /META-INF/yang/config.yang is null

2017-09-14 Thread Tom Pantelis
On Thu, Sep 14, 2017 at 11:25 AM, Robert Varga <n...@hq.sk> wrote:

>
>
> On 14/09/17 16:31, Tom Pantelis wrote:
> >
> >
> > On Thu, Sep 14, 2017 at 10:30 AM, Tom Pantelis <tompante...@gmail.com
> > <mailto:tompante...@gmail.com>> wrote:
> >
> > I just refreshed my local yangtools and controller and see the
> > errors. Under config-api/target/classes/META-INF/yang/ the generated
> > file is not con...@2013-04-05.yang. Something changed in
> > yangtools  I suppose we need to change the tests to look for
> > con...@2013-04-05.yang
> >
> >
> > I meant "the generated file is NOW con...@2013-04-05.yang"
>
> Yea, a slight adjustment is needed in the BasicCodeGenerator
> implementation. Give me 10 mintues :)
>

Cool. Thanks Robert.


>
> Regards,
> Robert
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] yang-jmx-generator AbstractYangTest.loadYangFiles:56->AbstractYangTest.getConfigApiYangInputStreams:81->AbstractYangTest.getStreams:101 /META-INF/yang/config.yang is null

2017-09-14 Thread Tom Pantelis
On Thu, Sep 14, 2017 at 10:30 AM, Tom Pantelis <tompante...@gmail.com>
wrote:

> I just refreshed my local yangtools and controller and see the errors.
> Under config-api/target/classes/META-INF/yang/ the generated file is
> not con...@2013-04-05.yang. Something changed in yangtools  I suppose
> we need to change the tests to look for con...@2013-04-05.yang
>
>
I meant "the generated file is NOW con...@2013-04-05.yang"


> On Thu, Sep 14, 2017 at 10:03 AM, Michael Vorburger <vorbur...@redhat.com>
> wrote:
>
>> Hey guys,
>>
>> anyone know what this build failure which popped up in the last 24 hours
>> I think is all about:
>>
>> [INFO] config-plugin-parent ... SUCCESS [  0.378 
>> s][INFO] yang-jmx-generator . FAILURE [  
>> 5.537 s]
>>
>> *12:32:34* Running 
>> org.opendaylight.controller.config.yangjmxgenerator.unknownextension.UnknownExtensionTest*12:32:35*
>>  Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.264 sec 
>> <<< FAILURE! - in 
>> org.opendaylight.controller.config.yangjmxgenerator.unknownextension.UnknownExtensionTest*12:32:35*
>>  
>> testStopOnUnknownLanguageExtension(org.opendaylight.controller.config.yangjmxgenerator.unknownextension.UnknownExtensionTest)
>>   Time elapsed: 0.026 sec  <<< FAILURE!*12:32:35* java.lang.AssertionError: 
>> /META-INF/yang/config.yang is null*12:32:35* at 
>> org.junit.Assert.fail(Assert.java:88)*12:32:35*  at 
>> org.junit.Assert.assertTrue(Assert.java:41)*12:32:35*at 
>> org.junit.Assert.assertNotNull(Assert.java:621)*12:32:35*at 
>> org.opendaylight.controller.config.yangjmxgenerator.AbstractYangTest.getStreams(AbstractYangTest.java:101)
>>
>> Tx,
>> M.
>> --
>> Michael Vorburger, Red Hat
>> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>>
>> ___
>> controller-dev mailing list
>> controller-dev@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>
>>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] yang-jmx-generator AbstractYangTest.loadYangFiles:56->AbstractYangTest.getConfigApiYangInputStreams:81->AbstractYangTest.getStreams:101 /META-INF/yang/config.yang is null

2017-09-14 Thread Tom Pantelis
I just refreshed my local yangtools and controller and see the errors.
Under config-api/target/classes/META-INF/yang/ the generated file is
not con...@2013-04-05.yang. Something changed in yangtools  I suppose
we need to change the tests to look for con...@2013-04-05.yang

On Thu, Sep 14, 2017 at 10:03 AM, Michael Vorburger 
wrote:

> Hey guys,
>
> anyone know what this build failure which popped up in the last 24 hours I
> think is all about:
>
> [INFO] config-plugin-parent ... SUCCESS [  0.378 
> s][INFO] yang-jmx-generator . FAILURE [  
> 5.537 s]
>
> *12:32:34* Running 
> org.opendaylight.controller.config.yangjmxgenerator.unknownextension.UnknownExtensionTest*12:32:35*
>  Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 0.264 sec 
> <<< FAILURE! - in 
> org.opendaylight.controller.config.yangjmxgenerator.unknownextension.UnknownExtensionTest*12:32:35*
>  
> testStopOnUnknownLanguageExtension(org.opendaylight.controller.config.yangjmxgenerator.unknownextension.UnknownExtensionTest)
>   Time elapsed: 0.026 sec  <<< FAILURE!*12:32:35* java.lang.AssertionError: 
> /META-INF/yang/config.yang is null*12:32:35*  at 
> org.junit.Assert.fail(Assert.java:88)*12:32:35*  at 
> org.junit.Assert.assertTrue(Assert.java:41)*12:32:35*at 
> org.junit.Assert.assertNotNull(Assert.java:621)*12:32:35*at 
> org.opendaylight.controller.config.yangjmxgenerator.AbstractYangTest.getStreams(AbstractYangTest.java:101)
>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] No ReadWriteTransaction (newReadWriteTransaction) on mdsal DataBroker, only controller's

2017-09-14 Thread Tom Pantelis
On Wed, Sep 13, 2017 at 8:11 PM, Michael Vorburger 
wrote:

> Helo,
>
> I'm curious why the org.opendaylight.mdsal.binding.api.DataBroker does
> not have a ReadWriteTransaction newReadWriteTransaction() method like the
> org.opendaylight.controller.md.sal.binding.api.DataBroker does?
>

That's a good question. ReadWriteTransaction is widely used so its absence
will make it difficult to switch to the mdsal APIs. I submitted
https://git.opendaylight.org/gerrit/#/c/50156/ a while ago to add it.


>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] Autorelease nitrogen failed to build sal-akka-raft from controller

2017-09-05 Thread Tom Pantelis
It's intermittent. I've never seen that test fail - the log output
indicates it was correct and should not have triggered a failure. So far
I've run it over 100 times successfully locally.

On Tue, Sep 5, 2017 at 9:02 PM, Thanh Ha 
wrote:

> On Tue, Sep 5, 2017 at 8:35 PM, Jenkins  opendaylight.org> wrote:
>
>> Attention controller-devs,
>>
>> Autorelease nitrogen failed to build sal-akka-raft from controller in
>> build
>> 189. Attached is a snippet of the error message related to the
>> failure that we were able to automatically parse as well as console logs.
>>
>>
>> Console Logs:
>> https://logs.opendaylight.org/releng/jenkins092/autorelease-
>> release-nitrogen/189
>>
>> Jenkins Build:
>> https://jenkins.opendaylight.org/releng/job/autorelease-rele
>> ase-nitrogen/189/
>>
>> Please review and provide an ETA on when a fix will be available.
>>
>> Thanks,
>> ODL releng/autorelease team
>>
>>
> Looks like a test failure. I'm not sure if it's intermittent so I'll start
> another build.
>
> Regards,
> Thanh
>
>
> ___
> release mailing list
> rele...@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/release
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] DataStore-write

2017-09-01 Thread Tom Pantelis
On Fri, Sep 1, 2017 at 10:29 AM, qw...@ticomm.cn  wrote:

> Hi Folks,
> I get a problem as following,anyone can gvie an idea?
>
> *ERROR info:*
> 2017-09-01 22:15:15,925 | WARN  | lt-dispatcher-43 |
> ConcurrentDOMDataBroker  | 207 - org.opendaylight.
> controller.sal-distributed-datastore - 1.5.0.Carbon | Tx:
>  DOM-65 Error during phase CAN_COMMIT, starting Abort
> OptimisticLockFailedException{message=Optimistic lock
> failed., errorList=[RpcError [message=Optimistic lock
> failed., severity=ERROR, errorType=APPLICATION, tag=resource-denied,
> applicationTag=null, info=null, cause=org.opendaylight.
> yangtools.yang.data.api.schema.tree.ConflictingModificationApplied
> Exception: Node was replaced by other transaction.]]}
>

This means you have 2 writers/threads updating the same part of the data
tree around the same time.



> at org.opendaylight.controller.cluster.datastore.ShardDataTree.lambda$
> processNextPendingTransaction$0(ShardDataTree.java:750)[207:
> org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
> at org.opendaylight.controller.cluster.datastore.ShardDataTree.
> processNextPending(ShardDataTree.java:788)[207:
> org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
> at org.opendaylight.controller.cluster.datastore.ShardDataTree.
> processNextPendingTransaction(ShardDataTree.java:733)[207:
> org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
> at org.opendaylight.controller.cluster.datastore.
> ShardDataTree.startCanCommit(ShardDataTree.java:814)[207:
> org.opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
> at org.opendaylight.controller.cluster.datastore.
> SimpleShardDataTreeCohort.canCommit(SimpleShardDataTreeCohort.
> java:105)[207:org.opendaylight.controller.sal-distributed-datastore:1.5.0.
> Carbon]
> at org.opendaylight.controller.cluster.datastore.CohortEntry.canCommit(
> CohortEntry.java:97)[207:org.opendaylight.controller.sal-
> distributed-datastore:1.5.0.Carbon]
> at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.
> handleCanCommit(ShardCommitCoordinator.java:236)[207:org.opendaylight.
> controller.sal-distributed-datastore:1.5.0.Carbon]
> at org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.
> handleReadyLocalTransaction(ShardCommitCoordinator.java:
> 200)[207:org.opendaylight.controller.sal-distributed-
> datastore:1.5.0.Carbon]
> at org.opendaylight.controller.cluster.datastore.Shard.
> handleReadyLocalTransaction(Shard.java:623)[207:org.
> opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
> at org.opendaylight.controller.cluster.datastore.
> Shard.handleNonRaftCommand(Shard.java:313)[207:org.
> opendaylight.controller.sal-distributed-datastore:1.5.0.Carbon]
> at org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(
> RaftActor.java:270)[671:org.opendaylight.controller.sal-
> akka-raft:1.5.0.Carbon]
> at org.opendaylight.controller.cluster.common.actor.
> AbstractUntypedPersistentActor.onReceiveCommand(
> AbstractUntypedPersistentActor.java:31)[670:org.
> opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
> at akka.persistence.UntypedPersistentActor.onReceive(PersistentActor.
> scala:170)[188:com.typesafe.akka.persistence:2.4.17]
> at org.opendaylight.controller.cluster.common.
> actor.MeteringBehavior.apply(MeteringBehavior.java:104)[
> 670:org.opendaylight.controller.sal-clustering-commons:1.5.0.Carbon]
> at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(
> ActorCell.scala:544)[181:com.typesafe.akka.actor:2.4.17]
> at akka.actor.Actor$class.aroundReceive(Actor.scala:497)
> [181:com.typesafe.akka.actor:2.4.17]
> at akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$
> super$aroundReceive(PersistentActor.scala:168)[188:com.typesafe.akka.
> persistence:2.4.17]
> at akka.persistence.Eventsourced$$anon$1.stateReceive(Eventsourced.
> scala:664)[188:com.typesafe.akka.persistence:2.4.17]
> at akka.persistence.Eventsourced$class.aroundReceive(Eventsourced.
> scala:183)[188:com.typesafe.akka.persistence:2.4.17]
> at akka.persistence.UntypedPersistentActor.aroundReceive(PersistentActor.
> scala:168)[188:com.typesafe.akka.persistence:2.4.17]
> at akka.actor.ActorCell.receiveMessage(ActorCell.
> scala:526)[181:com.typesafe.akka.actor:2.4.17]
> at akka.actor.ActorCell.invoke(ActorCell.scala:495)[
> 181:com.typesafe.akka.actor:2.4.17]
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:
> 257)[181:com.typesafe.akka.actor:2.4.17]
> at akka.dispatch.Mailbox.run(Mailbox.scala:224)[181:com.
> typesafe.akka.actor:2.4.17]
> at akka.dispatch.Mailbox.exec(Mailbox.scala:234)[181:com.
> typesafe.akka.actor:2.4.17]
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)[177:org.scala-lang.scala-library:
> 2.11.8.v20160304-115712-1706a37eb8]
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1339)[177:org.scala-lang.scala-library:2.11.8.
> 

Re: [controller-dev] Can we remove remaining use of CSS in controller itself?

2017-08-29 Thread Tom Pantelis
Unfortunately usage of CSS in sal-distributed-datastore, sal-dom-broker-config
et al needs to remain until it is decided to completely remove CSS, which
we're targeting for Flourine. I have cleaned up CSS from most projects
 participating in Nitrogen.

On Tue, Aug 29, 2017 at 9:35 AM, Michael Vorburger 
wrote:

> Hello controllers,
>
> while starting to see how to solve https://bugs.opendaylight.org/
> show_bug.cgi?id=9068, I realized that those are CSS related problems...
> but we anway don't need / use CSS anymore, on master, do we?
>
> There may be other reload problems after for the scenario of Bug 9068, but
> would it be a good idea and a welcome contribution if I were to propose a
> Gerrite to at least already remove the remaining use of config-parent from
> sal-distributed-datastore, sal-clustering-commons and sal-dom-broker-config
> (what is that?) and messagebus-impl?
>
> I would love leave still leave config/config-parent & Co. in controller,
> for the moment, as even if we rid controller itself from any CSS
> completely, I don't know what use other projects still make of it -
> cleaning that up could be a subsequent next step to above.
>
> Or is this not as a simple as I think it could be (not "just" replacing
> some config-parent by binding-parent), or we actually somehow still use CSS
> a bit, so can't do?
>
> Tx,
> M.
> --
> Michael Vorburger, Red Hat
> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = http://vorburger.ch
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [release] Autorelease carbon failed to build sal-distributed-datastore from controller

2017-08-28 Thread Tom Pantelis
On Sun, Aug 27, 2017 at 11:20 PM, Anil Belur 
wrote:

> Hello controller-dev,
>
> We are seeing the following test failure for Carbon builds. Please confirm
> if this needs to be fixed or an intermittent test failure.
>

It's intermittent.


>
> Failed tests:
>   EntityOwnershipShardTest.testOwnerChangesOnPeerAvailabilityChanges:647->
> AbstractEntityOwnershipTest.verifyRaftState:275->lambda$
> testOwnerChangesOnPeerAvailabilityChanges$2:648 getRaftState
> expected:<[Leader]> but was:<[Candidate]>
>
> Thanks,
> Anil
>
> On Mon, Aug 28, 2017 at 11:12 AM, Jenkins  opendaylight.org> wrote:
>
>> Attention controller-devs,
>>
>> Autorelease carbon failed to build sal-distributed-datastore from
>> controller in build
>> 447. Attached is a snippet of the error message related to the
>> failure that we were able to automatically parse as well as console logs.
>>
>>
>> Console Logs:
>> https://logs.opendaylight.org/releng/jenkins092/autorelease-
>> release-carbon/447
>>
>> Jenkins Build:
>> https://jenkins.opendaylight.org/releng/job/autorelease-rele
>> ase-carbon/447/
>>
>> Please review and provide an ETA on when a fix will be available.
>>
>> Thanks,
>> ODL releng/autorelease team
>>
>>
>> ___
>> release mailing list
>> rele...@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/release
>>
>>
>
> ___
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] OOM Bug 9034

2017-08-24 Thread Tom Pantelis
On Thu, Aug 24, 2017 at 12:17 PM, Michael Vorburger 
wrote:

> On Thu, Aug 24, 2017 at 9:06 AM, Michael Vorburger 
>> wrote:
>>
>>> On Wed, Aug 23, 2017 at 8:54 PM, Robert Varga  wrote:
>>>
 On 23/08/17 20:21, Michael Vorburger wrote:
 > On Wed, Aug 23, 2017 at 2:11 PM, Robert Varga  > wrote:
 >
 > On 23/08/17 13:48, Michael Vorburger wrote:
 > >
 > > Robert> Actually no. The backend side of things looks okay, as
 chains
 > > are being
 > > both closed and purged when requested from the frontend. I
 suspect
 > > somebody is forgetting to close their transaction chains...
 > >
 > > Michael> Is a "transaction chain" the same as a transaction -
 the
 > > objects returned from DataBroker's newReadWriteTransaction /
 > > newReadWriteTransaction / newWriteOnlyTransaction? I did
 stumble upont
 > > something - could https://git.opendaylight.org/g
 errit/#/c/62196/
 >  fix
 > > this problem? Or is that unlikely to fix this OOM, but still a
 Good
 > > Idea? Or doesn't matter to do cancel() if not submit() ?
 >
 > 'the same' in the sense that both are resources which need to be
 closed.
 > So yes, it does matter that they are properly closed.
 >
 >
 > Thanks. I spent the afternoon shooting in a couple of directions
 related
 > to this (details in https://bugs.opendaylight.org/
 show_bug.cgi?id=9034#).
 >
 > But the problem is what I've found so far is all just "stabs in the
 > dark". What we really need is a way to more reliably find the origin
 of
 > code where transactions (or TransactionChain) are created but never
 > closed...
 >
 > You mentioned on IRC that a datastore.cfg may have some option for
 that,
 > but I could not find this - does anyone know any details about that?

 nite@nitebug : ~$ cd distribution-karaf-0.6.2-SNAPSHOT/
 nite@nitebug : ~/distribution-karaf-0.6.2-SNAPSHOT$ ./bin/karaf
 opendaylight-user@root>feature:install odl-mdsal-broker
 nite@nitebug : ~/distribution-karaf-0.6.2-SNAPSHOT$ ls -l
 etc/org.opendaylight.controller.cluster.datastore.cfg
 -rw-rw-r--. 1 nite nite 4896 Aug 23 20:47
 etc/org.opendaylight.controller.cluster.datastore.cfg

 for some reason debug-transactions is not documented.
 https://git.opendaylight.org/gerrit/62224 fixes that.

>>>
>>> Tom, I've tried to dig into this (supposed) debug-transactions flag,
>>> but am starting to have doubts if this feature actually really exists /
>>> works (anymore? or ever did?) ... do you know anything more about the
>>> history of this, or can take a moment to help us find out more? Cauz Robert
>>> on IRC said: "yeah ... I don't know the exact story // the yang models was
>>> definitely used hen we use CSS // what the exact state is these days ...
>>> best ping tpantelis". So here's what I've found so far:
>>>
>>> sal-distributed-datastore/src/main/yang/distributed-datastore-provider.yang
>>> is where this "debug-transactions" should be, because that is the model for
>>> datastore.cfg. That YANG model does have all the other properties used in
>>> that datastore.cfg, just not debug-transactions.
>>>
>>
>> It's defined in the yang as transaction-debug-context-enabled so the
>> .cfg file must have the same name.
>>
>
> Duh! I should have seen that.. :-( sorry & thanks a lot.
>
> So with that I'm starting to see the related stuff in the code. What I
> don't get yet is how this was this intended to be "surfaced" - the idea
> probably is just to find the throwable during forensics of an hprof heap
> dump after an OOM, if one finds a large number of ClientBackedTransaction
> (subclass) instances? There's no CLI command (or JMX, or whatever) to "list
> all unclosed transactions", is there?
>
> More importantly, as far as I can see so far by going through the code
> (not all of which makes a whole lot of sense so far, I'll admit!), this
> flag as is won't actually help us to find the root cause of
> https://bugs.opendaylight.org/show_bug.cgi?id=9034 yet - which is what
> this thread is really about. Because the ClientBackedDataStore keeps the
> allocationContext() for newRead/Write[Only]Transaction(), but for
> createTransactionChain() it just passes on debugAllocation() - so that the
> Tx that the chain creates can or not have their respective
> allocationContext preserved... but if I understand Rboert's analysis of
> this Bug 9034 correctly, he's saying that overflowing Map I'm seeing there
> is due to (many) TransactionChain themselves (not their Txs) not being
> closed - am I getting this right? So to get to the bottom of that, we'll
> need to (also) track the allocationContext at createTransactionChain()
> itself, don't 

Re: [controller-dev] OOM Bug 9034

2017-08-24 Thread Tom Pantelis
On Thu, Aug 24, 2017 at 9:06 AM, Michael Vorburger 
wrote:

> On Wed, Aug 23, 2017 at 8:54 PM, Robert Varga  wrote:
>
>> On 23/08/17 20:21, Michael Vorburger wrote:
>> > On Wed, Aug 23, 2017 at 2:11 PM, Robert Varga > > > wrote:
>> >
>> > On 23/08/17 13:48, Michael Vorburger wrote:
>> > >
>> > > Robert> Actually no. The backend side of things looks okay, as
>> chains
>> > > are being
>> > > both closed and purged when requested from the frontend. I suspect
>> > > somebody is forgetting to close their transaction chains...
>> > >
>> > > Michael> Is a "transaction chain" the same as a transaction - the
>> > > objects returned from DataBroker's newReadWriteTransaction /
>> > > newReadWriteTransaction / newWriteOnlyTransaction? I did stumble
>> upont
>> > > something - could https://git.opendaylight.org/gerrit/#/c/62196/
>> >  fix
>> > > this problem? Or is that unlikely to fix this OOM, but still a
>> Good
>> > > Idea? Or doesn't matter to do cancel() if not submit() ?
>> >
>> > 'the same' in the sense that both are resources which need to be
>> closed.
>> > So yes, it does matter that they are properly closed.
>> >
>> >
>> > Thanks. I spent the afternoon shooting in a couple of directions related
>> > to this (details in https://bugs.opendaylight.org/show_bug.cgi?id=9034#
>> ).
>> >
>> > But the problem is what I've found so far is all just "stabs in the
>> > dark". What we really need is a way to more reliably find the origin of
>> > code where transactions (or TransactionChain) are created but never
>> > closed...
>> >
>> > You mentioned on IRC that a datastore.cfg may have some option for that,
>> > but I could not find this - does anyone know any details about that?
>>
>> nite@nitebug : ~$ cd distribution-karaf-0.6.2-SNAPSHOT/
>> nite@nitebug : ~/distribution-karaf-0.6.2-SNAPSHOT$ ./bin/karaf
>> opendaylight-user@root>feature:install odl-mdsal-broker
>> nite@nitebug : ~/distribution-karaf-0.6.2-SNAPSHOT$ ls -l
>> etc/org.opendaylight.controller.cluster.datastore.cfg
>> -rw-rw-r--. 1 nite nite 4896 Aug 23 20:47
>> etc/org.opendaylight.controller.cluster.datastore.cfg
>>
>> for some reason debug-transactions is not documented.
>> https://git.opendaylight.org/gerrit/62224 fixes that.
>>
>
> Tom, I've tried to dig into this (supposed) debug-transactions flag, but
> am starting to have doubts if this feature actually really exists / works
> (anymore? or ever did?) ... do you know anything more about the history of
> this, or can take a moment to help us find out more? Cauz Robert on IRC
> said: "yeah ... I don't know the exact story // the yang models was
> definitely used hen we use CSS // what the exact state is these days ...
> best ping tpantelis". So here's what I've found so far:
>
> sal-distributed-datastore/src/main/yang/distributed-datastore-provider.yang
> is where this "debug-transactions" should be, because that is the model for
> datastore.cfg. That YANG model does have all the other properties used in
> that datastore.cfg, just not debug-transactions.
>

It's defined in the yang as transaction-debug-context-enabled so the .cfg
file must have the same name.


> sal-inmemory-datastore/src/main/yang/opendaylight-
> inmemory-datastore-provider.yang does have a "debug-transactions" - so
> its ...inmemory.datastore.provider.rev140617.DatastoreConfiguration has
> an isDebugTransactions() - but I cannot find any usage of it, even in
> inmemory-datastore.
>
> So it seems to be that something got lost here somewhere along the way -
> is that possible? Or am I just too dumb, and not understanding what's where
> in all this controller mdsal code?? ;-)
>
> BTW: What's the story for what is used and what is not used -
> inmemory-datastore is history? Want me to raise a Gerrit to remove it? Or
> still use for the test DataBroker (which I myself am I user of in our test
> in genius and netvirt) ?
>

It's not used in production but still used in unit tests. It will be
removed once we switch to mdsal project.


>
>
>> > Otherwise, an idea I just had in
>> > https://bugs.opendaylight.org/show_bug.cgi?id=9034#c7 would be to see
>> if
>> > I could bolt something onto that mdsal-trace we have (originally
>> > contributed by Josh) to keep track of opened-but-not-yet-closed
>> > transactions. See the idea? Could be very useful to get to the bottom of
>> > this kind of problem, no?
>>
>> No need, as mentioned above.
>>
>
> Well, until proven otherwise, it would seem there is a need after all
> then? ;-)
>
> Let me spend some more time to see if I could build this feature - I'll
> try to start rebuilding it from scratch; if there's existing code for this
> (which I cannot find), please point me to it so I don't redo what already
> exists somewhere!
>
>
>> > > > As far as I can see, with my still very limited
>> understanding of mdsal
>> 

Re: [controller-dev] Controller/Clustering Carbon SR2 Blocker Bug

2017-08-16 Thread Tom Pantelis
I would say to revert it at this point.

On Wed, Aug 16, 2017 at 7:35 PM, An Ho <an...@huawei.com> wrote:

> +Add CONTROLLER team.
>
> Hi Tom Pantelis and Robert Varga,
>
> Please help investigate this issue blocking Carbon SR2.  Several projects
> are having issues with their 3node tests.  We believe
> https://git.opendaylight.org/gerrit/#/c/61433/ is causing problems in
> 3node tests.  Please let us know if we can revert the patch or provide a
> new patch to fix the blocking issue.
>
> Best Regards,
> An Ho
>
>
> -Original Message-
> From: release-boun...@lists.opendaylight.org [mailto:
> release-boun...@lists.opendaylight.org] On Behalf Of Sam Hague
> Sent: Wednesday, August 16, 2017 3:55 PM
> To: Robert Varga; Jamo Luhrsen; Stephen Kitt; Release (
> rele...@lists.opendaylight.org)
> Subject: Re: [release] https://git.opendaylight.org/gerrit/#/c/61433/
> causing problems in 3node tests
>
> forgot to include release list
>
> looks like multiple projects are having 3node issues: openflowplugin,
> netvirt, controller and vtn.
>
> On Wed, Aug 16, 2017 at 6:42 PM, Sam Hague <sha...@redhat.com> wrote:
> > Robert,
> >
> > the patch seems to be causing issues for the 3node ha tests [1]. Over
> > half the test fails now ever since that patch went in. Any idea how it
> > could produce the following exception? This repeats all through the
> > tests.
> >
> > java.lang.IllegalStateException: Store tree
> > org.opendaylight.yangtools.yang.data.api.schema.tree.spi.
> MaterializedContainerNode@38ce1623
> > and candidate base
> > org.opendaylight.yangtools.yang.data.api.schema.tree.spi.
> MaterializedContainerNode@78447ed8
> > differ.
> > TransactionCommitFailedException{message=canCommit encountered an
> > unexpected failure, errorList=[RpcError [message=canCommit encountered
> > an unexpected failure, severity=ERROR,
> >
> >
> > Thanks, Sam
> >
> > [1] https://jenkins.opendaylight.org/releng/view/netvirt-csit/
> job/netvirt-csit-3node-openstack-ocata-upstream-stateful-carbon/93/
> ___
> release mailing list
> rele...@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/release
>
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


  1   2   >