Re: [controller-dev] ODL Cassandra Persistence

2018-09-18 Thread Muthukumaran K
Cassandra plugin is quite active and we also get reasonably good responses from 
the Akka Persistence forums.
Depending upon the volume of journals and snapshots, deployment scheme and size 
of snapshots planned to be stored in Cassandra, some level of tuning would be 
required on backend Cassandra cluster

Regards
Muthu

From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Michael 
Vorburger
Sent: Wednesday, September 19, 2018 2:42 AM
To: sat 
Cc: controller-dev 
Subject: Re: [controller-dev] ODL Cassandra Persistence

On Tue, 18 Sep 2018, 23:04 sat, 
mailto:sathish.al...@gmail.com>> wrote:
Hi,

Yes, we were looking for a project like this. Unfortunately the project is 
discontinued.

https://github.com/akka/akka-persistence-cassandra seems to be active?

Thanks
A.SathishKumar

On Tue, Sep 18, 2018 at 6:54 AM Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:

On Mon, Sep 17, 2018 at 11:28 PM sat 
mailto:sathish.al...@gmail.com>> wrote:
Hi Michael Vorburger,

Thanks, i will check it out.

Thanks
A.SathishKumar


There is an akka persistence plugin for Cassandra - 
https://github.com/krasserm/akka-persistence-cassandra.  I think this is what 
you're looking for.


On Mon, Sep 17, 2018 at 3:13 PM Michael Vorburger 
mailto:vorbur...@redhat.com>> wrote:

Sat,

On Thu, Sep 13, 2018 at 2:07 AM sat 
mailto:sathish.al...@gmail.com>> wrote:
Hi,

ODL uses "LevelDB" for persistence, we came to know that its prone for 
corruption. Did anyone try using Cassandra for persistence rather than LevelDB.

I see some posts with the same requirement, but there is no reply.

https://pantheon.tech/cassandra-datastore/ is a blog post which may interest 
you in this context; it's from a company that I am not affiliated with (and 
won't be able to further comment on here).

BTW: https://github.com/vorburger/opendaylight-etcd is somewhat related WIP 
work in FLOSS where I'm actively exploring the use of etcd (not Cassandra) as a 
data store.

Tx,
M.
--
Michael Vorburger, Red Hat
vorbur...@redhat.com | IRC: vorburger @freenode | 
~ = http://vorburger.ch



--
A.SathishKumar
044-24735023
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


--
A.SathishKumar
044-24735023
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [mdsal-dev] Does commiting a transaction with no operations have any noteworthy (real life) overhead compared to cancelling it?

2018-07-27 Thread Muthukumaran K
Hi Robert, 

Slightly orthogonal question delving more into overheads 

Would the overhead not be higher than just trivially marginal on the backend 
for a transaction ?

In other words, the moment newXYZTransaction is created, proxy, context, 
snapshot, txn-actors and other related objects get initiated at the backend - 
is it not ? 

And an empty Op list of transaction is inferred only during  submit call and 
all of above initializations will straight away be garbaged right ? 

If the "NOOP" type txn count is far lesser, then the pressure of such garbage 
would not be higher but OTOH, if this NOOP txn semantics is misused, their 
count could become humongous and hence corresponding unnecessary garbage and 
subsequent collection cycles - is it not ?

Am I missing something in context of laziness / eagerness with which all 
backend objects (heavy / light) created in  context of each txn ?

Regards
Muthu


-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Robert Varga
Sent: Friday, July 27, 2018 5:25 PM
To: Anil Vishnoi ; Michael Vorburger 

Cc: controller-dev ; 
mdsal-...@lists.opendaylight.org
Subject: Re: [controller-dev] [mdsal-dev] Does commiting a transaction with no 
operations have any noteworthy (real life) overhead compared to cancelling it?

On 27/07/18 11:41, Anil Vishnoi wrote:
> 
> My initial reaction is that such an optimization in
> ManagedNewTransactionRunner is probably pointless as whatever
> happens behind the scenes on a commit is surely already smart enough
> by itself for a submit on an empty transaction to basically be a low
> overhead NOOP anyway?
> 
> ​Or if transaction API can expose some api like isEmpty() (just 
> example), that can come bit handy here?

I don't think the benefit of such a method justifies additional state tracking 
required to support it.

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] ODL abrupt restart - System.exit() via QuarantinedMonitorActorPropsFactory ?

2018-07-05 Thread Muthukumaran K
Hi Michael,

Quarantine is the state when akka system level messages could not be exchanged 
across the nodes – these include but not limited to heartbeats, remote 
deathwatch, node state updates etc.

This article https://livingston.io/understanding-akkas-quarantine-state/ gives 
a fair idea

Some pointers on what could cause this are discussed here
https://groups.google.com/forum/#!searchin/akka-user/quarantine|sort:date/akka-user/6cmA1RzE4-s/IaHxhxLhEgAJ

We have seen the suicide in past earlier during long stop-the world type GCs as 
well as *deliberate* (for testing purposes) interface-down / up for 2550 …

Haven’t tested this behavior on master yet ..

Regards
Muthu




From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Michael 
Vorburger
Sent: Thursday, July 05, 2018 11:12 PM
To: Tom Pantelis 
Cc: Sridhar Gaddam ; Kitt, Stephen ; 
controller-dev 
Subject: Re: [controller-dev] ODL abrupt restart - System.exit() via 
QuarantinedMonitorActorPropsFactory ?

On Thu, Jul 5, 2018 at 7:39 PM, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:
On Thu, Jul 5, 2018 at 1:35 PM, Michael Vorburger 
mailto:vorbur...@redhat.com>> wrote:
Tom, or Robert, or anyone else having hit this themselves,

would you be able to remind us what in clustering can cause an ODL abrupt 
restart - System.exit() via bundleContext.getBundle(0).stop(); from 
https://github.com/opendaylight/controller/blob/master/opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/akka/osgi/impl/QuarantinedMonitorActorPropsFactory.java
 ?

I do vaguely an "inconsistent cluster" leading to this - clarify exactly what 
situation leads to that? Loss of leader? Loss of majority?

asking for https://bugzilla.redhat.com/show_bug.cgi?id=1597304 ...

That happens when akka quarantines a node - it can no longer rejoin the 
majority cluster unless the actor system is restarted, hence we restart the 
whole JVM.

and what can cause Akka to have to quarantine a node?

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [infrautils-dev] Sharding evolution

2018-06-10 Thread Muthukumaran K
Hi Robert, 

>> Invalid data being written 
This is certainly coding error. No patch-up makes any sense here.  

>> The second one stems from application design: why is the application not 
>> designed in a conflict - free manner
Fully agree !! There is no short-cut here. There is no better first step than 
single-writer approach by apps who owns specific part of the data tree and 
adhering to that strictly across initial feature impl,  enhancements / 
bug-fixes. Unless this foundation is there, no  other patch-up(s) can be of 
help 

'Compensatory transactions' is again the domain of applications and is 
orthogonal to the choice of standalone txn or chained txn as well as the type 
of failures

Regards
Muthu




-Original Message-
From: Robert Varga [mailto:n...@hq.sk] 
Sent: Saturday, June 09, 2018 5:37 PM
To: Muthukumaran K ; Faseela K 
; Anil Vishnoi 
Cc: infrautils-...@lists.opendaylight.org 
; controller-dev 
; genius-...@lists.opendaylight.org 

Subject: Re: [controller-dev] [infrautils-dev] Sharding evolution

Hello Muthu,

There are only two ways in which a transaction can fail aside from 'datastore 
is busted':
- invalid data being written
- conflicting activity outside of the causality chain

The first one is an obvious coding error and I don't quite see how you'd design 
a recovery strategy whose complexity does not exceed complexity of the normal 
path.

The second one stems from application design: why is the application not 
designed in a conflict - free manner? And when a conflict occurs, how do you 
know it's nature and how to reconcile it?

You certainly can redo a failed transaction: it is only a matter holding on to 
the inputs, i.e. DTCL view is immutable.

Nevertheless if it's performance you are after conflicts should happen once in 
a blue moon...

Sent from my BlackBerry - the most secure mobile device - via the Orange Network

  Original Message
From: muthukumara...@ericsson.com
Sent: June 9, 2018 10:10 AM
To: n...@hq.sk; faseel...@ericsson.com; vishnoia...@gmail.com
Cc: infrautils-...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org; genius-...@lists.opendaylight.org
Subject: RE: [controller-dev] [infrautils-dev] Sharding evolution

Transaction Chains is also useful in context of ensuring that last txn is 
completed before next is executed so that subsequent txn can see the changes 
made by previous one (of course within single subtree) more efficiently. And 
also enables single-writer discipline

@Robert,

In context of Txn Chain, if 10 txns are submitted and failure occurs at 5th 
txn, the chain would provide a failure callback.
Most rampant pattern part for apps would be submitting txns to the chain from 
DTCLs or CDTCLs. Assuming, 10 change notifications resulted in 10 chain txn 
submits and chain fails the 5th txn due to valid reasons, now, apps have lost 
the context of 5 txns which failed.

In such scenarios, what would be a better approach for apps to perform any 
compensatory actions for failed transactions in context of using chain?

Regards
Muthu

-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Robert Varga
Sent: Saturday, June 09, 2018 6:25 AM
To: Faseela K ; Anil Vishnoi 
Cc: infrautils-...@lists.opendaylight.org; controller-dev 
; genius-...@lists.opendaylight.org
Subject: Re: [controller-dev] [infrautils-dev] Sharding evolution

On 09/06/18 02:06, Faseela K wrote:
> [Changed the subject]
> 
>  
> 
> Anil, now you can ask ;)
> 
>  
> 
> https://wiki.opendaylight.org/view/Genius:Sharding_evolution
> 

MD-SAL long-term design:
https://wiki.opendaylight.org/view/MD-SAL:Boron:Conceptual_Data_Tree

Make sure to align your thinking with that... Splitting lists at MD-SAL runs 
into the problem of consistent hashing and scatter/gather operations:
- given a key I must know which shard it belongs to (and that determination has 
to be *quick*)
- anything crossing shards is subject to coordination, which is a *lot* less 
efficient than single-shard commits

If it's performance you are after:
- I cannot stress the importance of TransactionChains enough: if you cannot do 
them, you need to go back to the drawing board, as causality and shared fate 
*must* be properly expressed
- Avoid cross-shard transactions at (pretty much) all cost. I know of
*no* reason to commit to inventory and topology at the same time - if you have 
a use case which cannot be supported without it, please do describe it (and 
explain why it cannot be done)
- No synchronous operations, anywhere
- Producers (tx.submit() are just one side of the equation, consumers
(DTCL) are equally important

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [infrautils-dev] Sharding evolution

2018-06-09 Thread Muthukumaran K
Transaction Chains is also useful in context of ensuring that last txn is 
completed before next is executed so that subsequent txn can see the changes 
made by previous one (of course within single subtree) more efficiently. And 
also enables single-writer discipline

@Robert,

In context of Txn Chain, if 10 txns are submitted and failure occurs at 5th 
txn, the chain would provide a failure callback.
Most rampant pattern part for apps would be submitting txns to the chain from 
DTCLs or CDTCLs. Assuming, 10 change notifications resulted in 10 chain txn 
submits and chain fails the 5th txn due to valid reasons, now, apps have lost 
the context of 5 txns which failed.

In such scenarios, what would be a better approach for apps to perform any 
compensatory actions for failed transactions in context of using chain?

Regards
Muthu

-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Robert Varga
Sent: Saturday, June 09, 2018 6:25 AM
To: Faseela K ; Anil Vishnoi 
Cc: infrautils-...@lists.opendaylight.org; controller-dev 
; genius-...@lists.opendaylight.org
Subject: Re: [controller-dev] [infrautils-dev] Sharding evolution

On 09/06/18 02:06, Faseela K wrote:
> [Changed the subject]
> 
>  
> 
> Anil, now you can ask ;)
> 
>  
> 
> https://wiki.opendaylight.org/view/Genius:Sharding_evolution
> 

MD-SAL long-term design:
https://wiki.opendaylight.org/view/MD-SAL:Boron:Conceptual_Data_Tree

Make sure to align your thinking with that... Splitting lists at MD-SAL runs 
into the problem of consistent hashing and scatter/gather operations:
- given a key I must know which shard it belongs to (and that determination has 
to be *quick*)
- anything crossing shards is subject to coordination, which is a *lot* less 
efficient than single-shard commits

If it's performance you are after:
- I cannot stress the importance of TransactionChains enough: if you cannot do 
them, you need to go back to the drawing board, as causality and shared fate 
*must* be properly expressed
- Avoid cross-shard transactions at (pretty much) all cost. I know of
*no* reason to commit to inventory and topology at the same time - if you have 
a use case which cannot be supported without it, please do describe it (and 
explain why it cannot be done)
- No synchronous operations, anywhere
- Producers (tx.submit() are just one side of the equation, consumers
(DTCL) are equally important

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [infrautils-dev] Sharding evolution

2018-06-09 Thread Muthukumaran K


smime.p7m
Description: S/MIME encrypted message
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [infrautils-dev] OK to resurrect c/64522 to first move infrautils.DiagStatus integration for datastore from genius to controller, and then improve it for GENIUS-138 ?

2018-06-07 Thread Muthukumaran K
I can think of one case which can fall-flat with the colocation of EOS owner of 
registered entity with a specific shard is that :
When the entity owner starts running txns against another shard which may not 
be collocated, then we do not get the fullest benefit of colocation.

For example, we register entity with name 
“I_want_to_be_colocated_with_shard_A_leader” and we have a suitable owner 
selection strategy to oblige the requirement, above entity can get the fullest 
benefit of colocation when above entity owner only confines its interactions 
only with shard-A.

If that is what we do for 80% of the functionality, then certainly such 
ownership-selection-strategy would yield good gains

ESOS tries to keep itself agnostic to this by allowing extension of strategy to 
select the owner – there are few strategies already in place.

https://github.com/opendaylight/controller/tree/master/opendaylight/md-sal/sal-distributed-datastore/src/main/java/org/opendaylight/controller/cluster/datastore/entityownership/selectionstrategy

@Tom,
Please correct me if I have missed something

Regards
Muthu



From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Faseela K
Sent: Friday, June 08, 2018 12:19 AM
To: Tom Pantelis 
Cc: infrautils-...@lists.opendaylight.org; controller-dev 
; genius-...@lists.opendaylight.org
Subject: Re: [controller-dev] [infrautils-dev] OK to resurrect c/64522 to first 
move infrautils.DiagStatus integration for datastore from genius to controller, 
and then improve it for GENIUS-138 ?

Tom,
  Currently we have certain cases, where we use EOS to ensure that we process a 
set of northbound+southbound events on same node.
  (I am not sure whether that is the actual purpose of EOS, but we use it like 
that as well. ;))
  This has certain issues that in a 3 node cluster, your entity owner might be 
node2, but the datastores you are writing to as a result of the event has a 
leader on node1, and the writes will end up being slow. So if I have a 
mechanism to force default-operational shard DTCNs to be processed on the 
leader of default-config-shard(if my writes as a result of the notifications is 
going to be config shard writes), I would like to use that.(I am not sure 
whether I made it clear, we can discuss this in our next genius meeting as 
well. I can point you to some usages in genius.)
Thanks,
Faseela

From: Tom Pantelis [mailto:tompante...@gmail.com]
Sent: Friday, June 08, 2018 12:13 AM
To: Faseela K mailto:faseel...@ericsson.com>>
Cc: Michael Vorburger mailto:vorbur...@redhat.com>>; 
infrautils-...@lists.opendaylight.org;
 controller-dev 
mailto:controller-dev@lists.opendaylight.org>>;
 genius-...@lists.opendaylight.org; 
Robert Varga mailto:n...@hq.sk>>
Subject: Re: [infrautils-dev] [controller-dev] OK to resurrect c/64522 to first 
move infrautils.DiagStatus integration for datastore from genius to controller, 
and then improve it for GENIUS-138 ?



On Thu, Jun 7, 2018 at 2:39 PM, Faseela K 
mailto:faseel...@ericsson.com>> wrote:
Not related in this context, but if we can get shard leader change 
notification, can we use that to derive an entity owner instead of using EOS? ;)

Not exactly sure what you mean but shards and EOS are 2 different concepts...


Thanks,
Faseela

From: 
infrautils-dev-boun...@lists.opendaylight.org
 
[mailto:infrautils-dev-boun...@lists.opendaylight.org]
 On Behalf Of Tom Pantelis
Sent: Friday, June 08, 2018 12:07 AM
To: Michael Vorburger mailto:vorbur...@redhat.com>>
Cc: 
infrautils-...@lists.opendaylight.org;
 controller-dev 
mailto:controller-dev@lists.opendaylight.org>>;
 genius-...@lists.opendaylight.org; 
Robert Varga mailto:n...@hq.sk>>
Subject: Re: [infrautils-dev] [controller-dev] OK to resurrect c/64522 to first 
move infrautils.DiagStatus integration for datastore from genius to controller, 
and then improve it for GENIUS-138 ?



On Thu, Jun 7, 2018 at 1:14 PM, Michael Vorburger 
mailto:vorbur...@redhat.com>> wrote:
Robert,

just to avoid any misunderstandings and unnecessary extra work to throw away, 
may we double check and confirm that we correctly understand your comment in  
https://jira.opendaylight.org/browse/GENIUS-138 to mean that we are past the 
"dependency of a mature project on an incubation project" objection and you are 
now OK with that we resurrect https://git.opendaylight.org/gerrit/#/c/64522/, 
to first move infrautils.DiagStatus integration for datastore from genius to 
controller? We would then improve it, in controller instead of genius, for the 
improvement proposed in issue GENIUS-138.

Tom, OK for you as well to have such a dependency from controller to infrautils?

I don't have

Re: [controller-dev] [opendaylight-dev] Topic : "Replacing the MD-SAL data store with etcd or something else"

2018-03-28 Thread Muthukumaran K
Is there any summary of points discussed during the meeting on this topic ?

Regards
Muthu

From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Michael 
Vorburger
Sent: Wednesday, March 28, 2018 10:06 PM
To: Casey Cain
Cc: mdsal-...@lists.opendaylight.org; integration-...@lists.opendaylight.org; 
Dayavanti Gopal Kamath; Stephen Kitt; controller-dev
Subject: Re: [controller-dev] [opendaylight-dev] Topic : "Replacing the MD-SAL 
data store with etcd or something else"

I can come, but did not understand which room... Shall we just meet in that 
developer area room with the tables?

On Wed, 28 Mar 2018, 09:31 Casey Cain, 
mailto:cc...@linuxfoundation.org>> wrote:
Here is the Zoom info.

Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/378977881

Or iPhone one-tap :
US: +16699006833,,378977881# or +16465588656,,378977881#
Or Telephone:
Dial(for higher quality, dial a number based on your current location):
US: +1 669 900 6833 or +1 646 558 8656 or +1 877 369 0926 (Toll Free) or +1 855 
880 1246 (Toll Free)
Meeting ID: 378 977 881
International numbers available: 
https://zoom.us/zoomconference?m=3N5Hunw-pq1P_JpmUXATbXXWSnyHqsBy


On Wed, Mar 28, 2018 at 9:28 AM, Andre Fredette 
mailto:afrede...@redhat.com>> wrote:
Is this confirmed?

Thanks,
Andre

On Tue, Mar 27, 2018 at 10:14 AM Abhijit Kumbhare 
mailto:abhijitk...@gmail.com>> wrote:
Hi folks,

Resurrecting an old thread that we have talked about for a while. I believe 
some folks were planning to discuss this at the DDF but for some reason there 
was no topic proposed on this. Chris, Robert and I were discussing having a 
meeting about this sometime tomorrow morning - say at 10 am. Chris will be able 
to arrange a room for us. We can ask Casey to have a Zoom session. It will be 
great if you guys can attend - at least Robert, Tom, Michael, Stephen, Muthu, 
etc.

Thanks,
Abhijit




On Sun, Jan 1, 2017 at 11:22 PM, Muthukumaran K 
mailto:muthukumara...@ericsson.com>> wrote:
Thanks for the explanation Robert.

When we refer access pattern and shard layout, I assume that it is in context 
of a given transaction owned by an application - please correct me if I am 
wrong.

As a detour, the reference to transactions leads to a bunch of questions on how 
MD-SAL transactions - as we know in context of IMDS, could map on to external 
backend for a given shard (which CDT enables)

For few transaction capabilities which are inherent in IMDS, there may not be 
equivalent concept/support in an external backend of choice. For example, 
concept like Transaction Chain.

Other implementation-specific aspects of transaction could also vary 
drastically - for example,
a) Snapshotting+MVCC as "I" of ACID is applicable for IMDS - but may not be 
true in case of Etcd / Cassandra - this mismatch may not have deeper 
ramifications to the end-user
b) strong-consistency for writes may be a do-it-yourself if backend choice is 
Cassandra whereas its ingrained in IMDS

Does this essentially mean that based on choice of backend, some concepts of 
transactions could be just no-op/ will be thrown with an unsupported exception 
(eg. transaction-chain) to manage the uniformity of broker contracts?

Regards
Muthu




-Original Message-
From: Robert Varga [mailto:n...@hq.sk<mailto:n...@hq.sk>]
Sent: Tuesday, December 27, 2016 9:10 PM
To: Muthukumaran K 
mailto:muthukumara...@ericsson.com>>; Colin Dixon 
mailto:co...@colindixon.com>>; Shrenik Jain 
mailto:shrenik.j...@research.iiit.ac.in>>
Cc: 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>
 mailto:d...@lists.opendaylight.org>>; 
controller-dev 
mailto:controller-dev@lists.opendaylight.org>>;
 mdsal-...@lists.opendaylight.org<mailto:mdsal-...@lists.opendaylight.org>; 
intern-ad...@opendaylight.org<mailto:intern-ad...@opendaylight.org>
Subject: Re: [controller-dev] Interested in Contribution : "Replacing the 
MD-SAL data store with etcd"
On 12/27/2016 11:38 AM, Muthukumaran K wrote:
> Hi Robert,
>
> Looking at
> https://github.com/opendaylight/mdsal/blob/master/dom/mdsal-dom-api/sr
> c/main/java/org/opendaylight/mdsal/dom/api/DOMDataTreeShardingService.
> java
>
> Few clarifications:
> For same prefix duplicate producers are not allowed - this is
> understandable. But there is a possibility that two distinct
> producers/shards can have overlapping prefixes - if this is allowed,
> would not there be a scenario wherein we could end up with a superset
> shard and subset shard
>
> Eg. Let's assume there is a subtree whose prefix is /a/b/c/d mapped to shard 
> 1 and another /a/b/c/d/e/f mapped to another shard.
> In other words, we can have recursive shards (hypothetical worst case could 
> be that these recursive shards could have their own backend instance

Re: [controller-dev] [openflowplugin-dev] Controller setup time question

2018-02-26 Thread Muthukumaran K
Hi Martinez,

In order to stress the OpenflowPlugin, we typically use an app called 
bulk-o-matic - 
https://github.com/opendaylight/openflowplugin/blob/master/applications/bulk-o-matic/src/site/asciidoc/bulk-o-matic.adoc
Please note that this app was meant for stressing openflowplugin without any 
dependency on higher level applications.

Please have a look at above document and check if this is closer to what you 
are looking for

Regards
Muthu


From: openflowplugin-dev-boun...@lists.opendaylight.org 
[mailto:openflowplugin-dev-boun...@lists.opendaylight.org] On Behalf Of Michael 
Vorburger
Sent: Monday, February 26, 2018 3:00 PM
To: Martinez Alvarez Guido Francisco
Cc: infrautils-...@lists.opendaylight.org; controller-dev; openflowplugin-dev
Subject: Re: [openflowplugin-dev] [controller-dev] Controller setup time 
question

+openflowplugin-dev & +infrautils-dev:

On Fri, Feb 23, 2018 at 6:55 PM, Martinez Alvarez Guido Francisco 
mailto:g.martinez-alva...@tu-braunschweig.de>>
 wrote:
Dear all, I am currently working on a solution for improving the efficiency of 
SFC/NSH, I am wondering if there are any plugins that can help me determine the 
configuration time of the controller, meaning how fast can it elaborate the 
flow rules and   setup the flow rules at the OF forwarding equipment 
(controller throughput factor).

Any feedback is highly appreciated

I'm unfortunately not sure what exactly you really want to measure and 
determine (flow programming? what would be in the openflowplugin project, not 
controller?), but speaking a little more generally, I thought you may be 
interested to know that we have a "metrics" sub-system in the infruatils 
project. But that's just an API and a way to collect such metrics.. what I'm 
guessing you may be interested in is perhaps look more intro contributing code 
that actually collects such metrics in the openflowplugin project, and sends 
them to infrautils.metrics, so that you can... do whatever you're after? I'm 
just guessing.

Tx,
M.
--
Michael Vorburger, Red Hat
vorbur...@redhat.com | IRC: vorburger @freenode | 
~ = http://vorburger.ch
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [opendaylight-dev] YourKit Profiler License

2018-02-14 Thread Muthukumaran K
Thank a lot :-)

Regards
Muthu


-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Robert Varga
Sent: Thursday, February 15, 2018 3:34 AM
To: Casey Cain; dev; controller-dev@lists.opendaylight.org
Subject: Re: [controller-dev] [opendaylight-dev] YourKit Profiler License

On 14/02/18 22:49, Casey Cain wrote:
> Hello members of the dev community.
> 
> I'm happy to announce that the ODL community licence for YourKit 
> Profiler has been renewed.
> https://www.yourkit.com/m/manage?t=YDZC2QOT9873VIBCL3B115CANN2

Awesome, thanks. Profile-away at those OOMs and timeouts :)

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Rate limiting during Txn Creation and impact on external systems

2018-02-11 Thread Muthukumaran K
On second thought, for such scenarios as below, the throttling or rate-limiting 
is better realized  beyond ODL boundary and not to exercise/depend upon DS 
level Txn rate-limiter.
So, in a sane deployment model, the rate-limiting for cases like below must 
even eliminate the possibilities of requests landing into ODL and must be 
contained within the originator (Neutron NB Client for example)

Any thoughts ?

Regards
Muthu




From: 
controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>
 [mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of 
Muthukumaran K
Sent: Monday, February 12, 2018 11:59 AM
To: controller-dev
Subject: [controller-dev] Rate limiting during Txn Creation and impact on 
external systems

Hi,

As we know, the txn rate-limiter *blocks* creation of new transaction(s) if the 
permits are not granted by rate-limiter.
So, in context of the direct broker clients within ODL , this creates a level 
of backpressure which is understandable.

But if the broker clients are invoked from the edge modules (eg. Neutron NB), 
the external client(s) could typically block. But the external client (eg. 
Neutron agent outside ODL process) may have a timeout and can fail the request 
as consequence. Are such external systems, expected to implement retry based on 
the timeout (of course with some backoff mechanism) ?

What I am trying to understand is - if the external clients do not implement 
any backoff, can this result in thundering-herd problems ?

I do not have a real running usecase on hand to prove/verify this. Would like 
to hear recommendations if anybody had experienced similar scenario(s).

Regards
Muthu



___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] Rate limiting during Txn Creation and impact on external systems

2018-02-11 Thread Muthukumaran K
Hi,

As we know, the txn rate-limiter *blocks* creation of new transaction(s) if the 
permits are not granted by rate-limiter.
So, in context of the direct broker clients within ODL , this creates a level 
of backpressure which is understandable.

But if the broker clients are invoked from the edge modules (eg. Neutron NB), 
the external client(s) could typically block. But the external client (eg. 
Neutron agent outside ODL process) may have a timeout and can fail the request 
as consequence. Are such external systems, expected to implement retry based on 
the timeout (of course with some backoff mechanism) ?

What I am trying to understand is - if the external clients do not implement 
any backoff, can this result in thundering-herd problems ?

I do not have a real running usecase on hand to prove/verify this. Would like 
to hear recommendations if anybody had experienced similar scenario(s).

Regards
Muthu




___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Should application code persist do retries on TransactionCommitFailedException caused by AskTimeoutException or could CDS be configured to retry more?

2018-01-18 Thread Muthukumaran K
+Jaya

-Original Message-
From: Robert Varga [mailto:n...@hq.sk] 
Sent: Thursday, January 18, 2018 7:43 PM
To: Jamo Luhrsen; Muthukumaran K; Sam Hague
Cc: controller-dev; genius-...@lists.opendaylight.org; Kency Kurian
Subject: Re: [controller-dev] Should application code persist do retries on 
TransactionCommitFailedException caused by AskTimeoutException or could CDS be 
configured to retry more?

On 12/01/18 23:12, Jamo Luhrsen wrote:
> https://jenkins.opendaylight.org/sandbox/job/netvirt-csit-3node-openst
> ack-ocata-gate-stateful-oxygen/2
> https://jenkins.opendaylight.org/releng/job/netvirt-csit-3node-opensta
> ck-ocata-gate-stateful-nitrogen/13
> 
> The sandbox job will be gone in aprox 36 hours. the other should stay for 6 
> months.

Thanks. I have picked up the logs and will try to take a look at them later.

Interesting data point: oxygen job has no warnings from 
FrontendClientMetadataBuilder, but the code should really be the same...

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Should application code persist do retries on TransactionCommitFailedException caused by AskTimeoutException or could CDS be configured to retry more?

2018-01-17 Thread Muthukumaran K
Quick Summary of the functional CSIT Runs for Netvirt cases with tell-based 
proto = ON as well as OFF conditions.

We are triggering one more set of test with and without tell-based protoc 
enabled because some failures in first set indicates that the VMs themselves 
were not reachable via SSH. So, in order to give benefit-of-doubt to tell-based 
protocol and to check if there are orthogonal issues in the test environment. 
We will publish those results too.

Anyway, Ajay’s observations with BGP-PCEP indicates that it would not be 
prudent to enable tell-based protocol as at now and it requires more deeper 
assessments.

Tell Based proto = ON
Results : 
https://jenkins.opendaylight.org/sandbox/job/netvirt-csit-3node-openstack-ocata-gate-stateful-oxygen/1/
Pass Rate : 68.8 %

Tell Based proto = OFF
Results : 
https://jenkins.opendaylight.org/sandbox/job/netvirt-csit-3node-openstack-ocata-gate-stateful-oxygen/2/
Pass Rate : 84.9 %

Quick observations from the karaf logs on first set of run :

1)  Node Unreachability followed by association-failure is as frequent in 
both runs

2)  But in case of Tell-Based proto scenario, association-failure triggers 
more frequent re-election (not sure why this is not so rampant because 
association-failure potentially triggering election is orthogonal to tell / ask 
enablement)

3)  As consequence of (2), determining the routee shard times-out very 
frequently in case of tell-based protoc


Will come back with observations on second set of runs soon

Regards
Muthu







From: Ajay Lele [mailto:ajaysl...@gmail.com]
Sent: Friday, January 12, 2018 12:19 PM
To: Muthukumaran K
Cc: Robert Varga; Sam Hague; controller-dev; 
genius-...@lists.opendaylight.org<mailto:genius-...@lists.opendaylight.org>; 
Kency Kurian; Bgpcep-Dev; Luis Gomez Palacios
Subject: Re: [controller-dev] Should application code persist do retries on 
TransactionCommitFailedException caused by AskTimeoutException or could CDS be 
configured to retry more?



On Thu, Jan 11, 2018 at 9:40 PM, Muthukumaran K 
mailto:muthukumara...@ericsson.com>> wrote:
Hi Sam, Robert,

On the observations which were made as early as September 2017 - 
https://lists.opendaylight.org/pipermail/netvirt-dev/2017-September/005518.html 
(thanks to Jamo for testing this out)
Enabling tell based protocol had 22% failure of CSIT at releng level. More 
details on the last sandbox and releng runs below

Having said that, since this is a 3 month old result and multiple changes would 
have gone into netvirt + genius itself, it would be prudential to test the same 
with the latest Oxygen build (at least it would reduce the possibility of 
misinterpreting netvirt + genius related issues as MD-SAL related issues). We 
will do one more sandbox run here at Ericsson with latest ODL Master and 
re-publish the results with and without tell-based protocol enabled by mid of 
next week. We will also try to run one round of bulk-flow provisioning with 
OFPlugin's bulk-o-matic test driver to see the scale behavior of tell-based 
protocol too.

Actually two runs were performed one on releng and another in sandbox between 
last week of August and mid of September 2017 against Nitrogen :

Releng run :
==
https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-3node-openstack-ocata-gate-stateful-nitrogen/7/log.html.gz

Sandbox run :
===
https://logs.opendaylight.org/sandbox/jenkins091/netvirt-csit-3node-openstack-ocata-jamo-upstream-stateful-nitrogen/1/odl1_karaf.log.gz

Jamo's observations from sandbox run :
results are not good. Looks like things pass from a black box perspective in 
our first l2 connectivity suite, but then lots of failures after that.

I also notice that our non-failing keyword to write to the karaf log using ssh 
to the karaf shell is failing, even in the above passing suite.

Also, it's worth noting that in order to enable tell-based protocol I'm just 
stealing a controller robot suite to do the work and running it first.
It makes the config change and reboots all the controllers.

In one karaf log (I only looked at one) I saw a bunch of WARN messages about 
"Unknown history  ignoring..."
example:

  FrontendClientMetadataBuilder| 215 - 
org.opendaylight.controller.sa<http://org.opendaylight.controller.sa>l-distributed-datastore
 - 1.7.0.SNAPSHOT | member
1-shard-topology-operational: Unknown history for aborted transaction 
member-1-datastore-operational-fe-4-txn-7810-1, ignoring

I also saw an ERROR about failure to serialize something or other:

2017-08-29 04:25:12,719 | ERROR | -dispatcher-3279 | EndpointWriter 
  | 41 - com.typesafe.akka.slf4j - 2.4.18
| Failed to serialize remote message [class akka.actor.Status$Failure]
| using serializer [class
akka.serialization.JavaSerializer]. Transient association error (association 
remains live)
akka.remote.MessageSerializer$SerializationException: Failed to serialize 
remote

Re: [controller-dev] Should application code persist do retries on TransactionCommitFailedException caused by AskTimeoutException or could CDS be configured to retry more?

2018-01-11 Thread Muthukumaran K
Hi Sam, Robert, 

On the observations which were made as early as September 2017 - 
https://lists.opendaylight.org/pipermail/netvirt-dev/2017-September/005518.html 
(thanks to Jamo for testing this out)
Enabling tell based protocol had 22% failure of CSIT at releng level. More 
details on the last sandbox and releng runs below

Having said that, since this is a 3 month old result and multiple changes would 
have gone into netvirt + genius itself, it would be prudential to test the same 
with the latest Oxygen build (at least it would reduce the possibility of 
misinterpreting netvirt + genius related issues as MD-SAL related issues). We 
will do one more sandbox run here at Ericsson with latest ODL Master and 
re-publish the results with and without tell-based protocol enabled by mid of 
next week. We will also try to run one round of bulk-flow provisioning with 
OFPlugin's bulk-o-matic test driver to see the scale behavior of tell-based 
protocol too. 

Actually two runs were performed one on releng and another in sandbox between 
last week of August and mid of September 2017 against Nitrogen : 

Releng run :
== 
https://logs.opendaylight.org/releng/jenkins092/netvirt-csit-3node-openstack-ocata-gate-stateful-nitrogen/7/log.html.gz

Sandbox run : 
===
https://logs.opendaylight.org/sandbox/jenkins091/netvirt-csit-3node-openstack-ocata-jamo-upstream-stateful-nitrogen/1/odl1_karaf.log.gz

Jamo's observations from sandbox run :
results are not good. Looks like things pass from a black box perspective in 
our first l2 connectivity suite, but then lots of failures after that.

I also notice that our non-failing keyword to write to the karaf log using ssh 
to the karaf shell is failing, even in the above passing suite.

Also, it's worth noting that in order to enable tell-based protocol I'm just 
stealing a controller robot suite to do the work and running it first.
It makes the config change and reboots all the controllers.

In one karaf log (I only looked at one) I saw a bunch of WARN messages about 
"Unknown history  ignoring..."
example:

  FrontendClientMetadataBuilder| 215 - 
org.opendaylight.controller.sal-distributed-datastore - 1.7.0.SNAPSHOT | member
1-shard-topology-operational: Unknown history for aborted transaction 
member-1-datastore-operational-fe-4-txn-7810-1, ignoring

I also saw an ERROR about failure to serialize something or other:

2017-08-29 04:25:12,719 | ERROR | -dispatcher-3279 | EndpointWriter 
  | 41 - com.typesafe.akka.slf4j - 2.4.18
| Failed to serialize remote message [class akka.actor.Status$Failure] 
| using serializer [class
akka.serialization.JavaSerializer]. Transient association error (association 
remains live)
akka.remote.MessageSerializer$SerializationException: Failed to serialize 
remote message [class akka.actor.Status$Failure] using serializer [class 
akka.serialization.JavaSerializer].

Observations:
===

-Original Message-
From: Robert Varga [mailto:n...@hq.sk] 
Sent: Friday, January 12, 2018 2:11 AM
To: Sam Hague
Cc: Michael Vorburger; Muthukumaran K; Tom Pantelis; controller-dev; 
genius-...@lists.opendaylight.org; Kency Kurian
Subject: Re: [controller-dev] Should application code persist do retries on 
TransactionCommitFailedException caused by AskTimeoutException or could CDS be 
configured to retry more?

Regards
Muthu


On 11/01/18 21:26, Sam Hague wrote:
> Robert,
> 
> when you mention odlparent/yangtools integrated - what does that mean?

I meant the yangtools-2.0.0 stuff needs to be merged up -- which obviously was 
delayed way longer than anticipated.

> do we think that will happen for oxygen?

I would love to have it in, but it does have potential to cause breakage
-- hence I am afraid we are out of runway.

> There are a number of clustering bugs open that all have 
> AskTimeoutException listed in the traces. I think the idea is the tell 
> based change will help and then we can dig deeper if the bugs still exist.

Yup.

> Muthu,
> 
> how did your testing with tell for netvirt tests go? Were we safe 
> switching to it?

*This* is the most critical question that needs to be answered. If netvirt and 
BGP greenlight it, I think we can make the switch ...

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Should application code persist do retries on TransactionCommitFailedException caused by AskTimeoutException or could CDS be configured to retry more?

2017-11-20 Thread Muthukumaran K
Hi Robert, 

I vaguely recollected that tell-based protocol was perf-tested for one of ODL 
projects (PCEP ?? I am not too sure). Are there any links on results for the 
same ?

Regards
Muthu


-Original Message-
From: Robert Varga [mailto:n...@hq.sk] 
Sent: Monday, November 20, 2017 9:23 PM
To: Michael Vorburger; Muthukumaran K; Tom Pantelis
Cc: controller-dev; genius-...@lists.opendaylight.org; Kency Kurian
Subject: Re: [controller-dev] Should application code persist do retries on 
TransactionCommitFailedException caused by AskTimeoutException or could CDS be 
configured to retry more?

On 16/11/17 19:49, Michael Vorburger wrote:
> Robert, https://git.opendaylight.org/gerrit/#/c/61002/ was from you, 
> were you planning to pick this up, for Oxygen?

Yes, but it has to wait until we get odlparent/yangtools integrated.

Bye,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Circuit Breaker timed out

2017-11-14 Thread Muthukumaran K
Correct Ajay. We have done a bit of experiment in increasing the timeout and we 
have not seen the recurrence.

We considered following options

1)  Recreate affected Shard

a.   We tried this using a “simulated shard-kill” and upon observance of 
shard-kill, Shard Manager recreates the shard - it did work !!  But we stay 
away from this solution mainly because of DTCN side-effects . We do not know 
how many applications would be tolerant to get the DTCNs all of a sudden 
(similar to restart of node) in unexpected manner because silent Shard-Restart 
implies that applications have to be thoroughly idempotent in handling DTCNs 
across shards so that single shard recreate does not affect whatever state they 
internally build-up via DTCNs

2)  Restart entire controller
b.A more intrusive change would be to perfom bundle – 0 stop and let the 
restart logic take care of restarting the node depending upon the environment 
(systemd , pacemaker etc.) upon
   onPersistFailure because anyway the system would 
be useless if one shard stops completely. We are trying to get this correctly 
working but have been unsuccessful so far.
Regards
Muthu


From: Ajay L [mailto:ajayl@gmail.com]
Sent: Wednesday, November 15, 2017 1:46 AM
To: controller-dev@lists.opendaylight.org
Cc: Muthukumaran K; Srini Seetharaman; Robert Varga; ajaysl...@gmail.com; Sai 
MarapaReddy
Subject: Re: [controller-dev] Circuit Breaker timed out

Hi All,

We are also seeing the "circuit breaker" error under heavy load. When this 
happens, the affected shard is stopped and never restarted and I think the only 
way to recover is to restart the node. I have opened 
https://jira.opendaylight.org/browse/CONTROLLER-1789 to request better recovery 
behavior. Increasing the akka journal persistence circuit-breaker call-timeout 
value (default is 10s) does help in making it more tolerant to outage

Regards
Ajay

On Wed, Aug 16, 2017 at 2:23 AM, Robert Varga mailto:n...@hq.sk>> 
wrote:
On 16/08/17 08:37, Muthukumaran K wrote:
> We have not tried on master branch (Nitrogen  / Akka 2.5). Not sure if
> such an issue would go away with Akka 2.5 because the circuit breaker is
> primarily with LevelDB plugin.
>

Nitrogen is on akka-2.4.18. akka-2.5.x (and others) are staged for Oxygen.

Bye,
Robert


___
controller-dev mailing list
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
https://lists.opendaylight.org/mailman/listinfo/controller-dev

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Should application code persist do retries on TransactionCommitFailedException caused by AskTimeoutException or could CDS be configured to retry more?

2017-11-08 Thread Muthukumaran K
Hi Michael,

From what I have experienced, let me try to answer the questions

>>> is this still the case, or is that propose change to master for some known 
>>> old problem that was meanwhile fixed in controller CDS infra?
This is one of the objectives of moving to tell-based protocol. More context – 
here - https://bugs.opendaylight.org/show_bug.cgi?id=5280 which also contains 
the gerrit topic link related to the changes.
This is disabled by default @ [1]


>>> does it seem right to you that application code handles this? Like wouldn't 
>>> it be better if there was some configuration knob somewhere in controller 
>>> CDS to increase whatever timeout or retry counter >>> is behind when these 
>>> TransactionCommitFailedException caused by akka.pattern.AskTimeoutException 
>>> occur, to tune it to try harder/longer, and not throw any 
>>> TransactionCommitFailed?
There are two situations of AskTimeouts which are typically predominant one for 
total idleness of transaction itself and other for overall transaction timeout

a)  operation-timeout-in-seconds – default 5 . this is very sporadic and 
almost never seen in releases latest releases

b)  shard-transaction-commit-timeout-in-seconds – default 30 , this is 
relatively more frequent in many cases of scale particularly in HA scenarios 
like restarts with configurations
Both these parameters are part of [1]

[1] $KARAF_HOME/etc/ org.opendaylight.controller.cluster.datastore.cfg

Regards
Muthu



From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Michael 
Vorburger
Sent: Thursday, November 09, 2017 1:50 AM
To: controller-dev
Cc: genius-...@lists.opendaylight.org; Kency Kurian
Subject: [controller-dev] Should application code persist do retries on 
TransactionCommitFailedException caused by AskTimeoutException or could CDS be 
configured to retry more?

Tom and other controllerians,

While code reviewing https://git.opendaylight.org/gerrit/#/c/61526/ for 
https://jira.opendaylight.org/browse/GENIUS-86, I learnt that, apparently 
(quote) "in scale testing, there are too many writes and reads over the 
network, and sometimes these AskTimeout exceptions occur due to the load, it is 
just that for sometime we are not able to reach the other side, but the nodes 
are all healthy, and it comes back soon", and wanted to know:

1. is this still the case, or is that propose change to master for some known 
old problem that was meanwhile fixed in controller CDS infra?

2. does it seem right to you that application code handles this? Like wouldn't 
it be better if there was some configuration knob somewhere in controller CDS 
to increase whatever timeout or retry counter is behind when these 
TransactionCommitFailedException caused by akka.pattern.AskTimeoutException 
occur, to tune it to try harder/longer, and not throw any 
TransactionCommitFailed?

3. when these do occur, is there really a "scenario where even though the 
transaction throws a TransactionCommitFailedException (caused by 
akka.pattern.AskTimeoutException) it eventually succeeds" ? That's what in 
c/61526 is being proposed to be added to the DataBrokerFailures test utility, 
to test such logic in application code... in DataBrokerFailuresImpl, it 
simulates a submit() that actually did go through and changed the DS (line 95 
super.submit().get()) but then return 
immediateFailedCheckedFuture(submitException) anyway. Is that really what 
(under this scenario) could happen IRL at prod from CDS? That seems... weird, 
curious - so it's transactions are not really (always) transactionally to be 
trusted? ;)

Tx,
M.
--
Michael Vorburger, Red Hat
vorbur...@redhat.com | IRC: vorburger @freenode | 
~ = http://vorburger.ch
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Questions about datastore

2017-10-17 Thread Muthukumaran K
Hi Robert, 

>>>   Could we use the make-leader-local RPC for this, and how?

When this API invoked when there are inflight transactions (transactions being 
in various stages from submit phase  till DTCN submission phase to respective 
DTCL actors) with erstwhile leader, what happens to those transactions ? I 
assume such transactions will face AskTimeout exception (not using tell-based 
proto) - is it correct ?

Regards
Muthu


-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Robert Varga
Sent: Wednesday, October 18, 2017 3:06 AM
To: Michael Vorburger
Cc: controller-dev@lists.opendaylight.org; mdsal-...@lists.opendaylight.org
Subject: [controller-dev] Questions about datastore

Hello Michael,

we discussed the items you have put on today's Kernel Projects' Call agenda.

> Stephen & Michael: Follow-up to sharding breakout session at DDF
> 
> How to have a single shard for operational and config, but with/without 
> persistence?

This is not possible. Operation and Configuration data stores are always 
separate instances of a DataTree.

Note that the ability to perform atomic commits across both config and oper 
data stores is going away as per the requirements and feedback gathered during 
the design of Conceptual Data Tree. See 
https://lists.opendaylight.org/pipermail/controller-dev/2016-April/011798.html
for details.

> How to control shard leader election, to force separate operational and 
> config shard to be on the same node?
> Could we use the make-leader-local RPC for this, and how?

Yes, and I do believe CSIT does this.

As for coordinated shard failover control, there is no provision to do this 
within CDS.

> Anyone has any objections to a Gerrit changing (upstream's) default shard 
> configuration file to 1 shard?

Is there a particular reason to do this?

I think we wanted to have this to make it obvious that topology and inventory 
are distinct...

Thanks,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Best way to gracefully shutdown Karaf in ODL context

2017-10-15 Thread Muthukumaran K
Hi Tom,

So, we should still be doing the bundle 0 stop for quarantine case ?  I presume 
so because this expectation is from Akka  – is that right ?

>>> If you want to push a patch to fix it, I'll merge it.
Sure Tom. Will do a local quarantine test with change and push the same

Regards
Muthu



From: Tom Pantelis [mailto:tompante...@gmail.com]
Sent: Friday, October 13, 2017 6:40 PM
To: Muthukumaran K
Cc: Daniel Farrell; Jamo Luhrsen; controller-dev@lists.opendaylight.org; 
integration-...@lists.opendaylight.org
Subject: Re: [controller-dev] Best way to gracefully shutdown Karaf in ODL 
context



On Fri, Oct 13, 2017 at 8:57 AM, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:


On Fri, Oct 13, 2017 at 12:59 AM, Muthukumaran K 
mailto:muthukumara...@ericsson.com>> wrote:
Thanks a lot for the pointers Daniel and JamO.

https://git.opendaylight.org/gerrit/gitweb?p=releng/builder.git;a=blob;f=jjb/packaging/stop-odl.sh;h=2e3e7bf15dfbe6e59bddfbfd4ce4805fb47b2a69;hb=refs/heads/master#l27
 which aligns with my thought too .. ☺

Just a clarification .. had there been any situation which you could recollect 
where the karaf PID lingered abnormally long (beyond 10 – 15 mins) during stop 
phase ? Have seen this once using vanilla distro  but was never able to repro 
the same for past 1 month or so even after several day 2 day restarts. May it 
was an env issue locally. So, I was a bit reserved in rolling the approach of 
stop followed by waiting till PID vanishes into production

@Tom, @Robert,

Not directly related but I will fire away …

Erstwhile 
https://github.com/opendaylight/controller/blob/master/opendaylight/md-sal/sal-clustering-commons/src/main/java/org/opendaylight/controller/cluster/common/actor/QuarantinedMonitorActor.java
 used to restart the entire container and now on master Quarantined state just 
restarts the ActorSystem – is my understanding right ?

It restarts the enclosing bundle:

return QuarantinedMonitorActor.props(() -> {
// restart the entire karaf container
LOG.warn("Restarting karaf container");
System.setProperty("karaf.restart.jvm", "true");
bundleContext.getBundle().stop();
});

It used to restart bundle 0. Not sure why that was changed

Looks like this was inadvertently changed by 
https://git.opendaylight.org/gerrit/#/c/62451/ - it used to be
 bundleContext.getBundle(0).stop();

If you want to push a patch to fix it, I'll merge it.



Regards
Muthu



From: Daniel Farrell [mailto:dfarr...@redhat.com<mailto:dfarr...@redhat.com>]
Sent: Friday, October 13, 2017 6:19 AM
To: Jamo Luhrsen; Muthukumaran K; 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>;
 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>
Subject: Re: [controller-dev] Best way to gracefully shutdown Karaf in ODL 
context

Hey Muthu,

Yes, I think you should take a look at the systemd configuration we ship in 
ODL's packages. As far as I know it does a good job of 
starting/stopping/restarting ODL's service.

https://git.opendaylight.org/gerrit/gitweb?p=integration/packaging.git;a=blob;f=packages/rpm/unitfiles/opendaylight.service;h=ac436592d2880047986b856c7dd6810665ba0d3e;hb=refs/heads/master

Here's a Nitrogen RPM that contains that systemd config:

http://cbs.centos.org/repos/nfv7-opendaylight-70-release/x86_64/os/Packages/opendaylight-7.0.0-1.el7.noarch.rpm

This test job shows examples of `sudo systemctl [start, stop, status]` working:

https://jenkins.opendaylight.org/releng/job/packaging-test-rpm-master

The logic for that job is here:

https://git.opendaylight.org/gerrit/gitweb?p=releng/builder.git;a=blob;f=jjb/packaging/packaging.yaml;h=e4de235ca543506063b7fb57c3d257f0b983abe3;hb=refs/heads/master#l346

That systemd config is also exercised in tests for puppet-opendaylight, 
ansible-opendaylight, OPNFV Apex and other OPNFV installers.

It seems like you've put some good thought into this, so if you have any 
suggestions for things we can do better please let us know. :)

Daniel

On Thu, Oct 12, 2017 at 11:47 AM Jamo Luhrsen 
mailto:jluhr...@gmail.com>> wrote:
+Daniel and Integration-dev,

Daniel,

does our rpm package and the systemd work you did for it answer any of Muthu's
questions below? I'm assuming it *IS* the answer, but you will know better.

Thanks,
JamO

On 10/12/2017 04:56 AM, Muthukumaran K wrote:
> Hi,
>
> * *
>
> *Context* : Figuring out the best possible way to gracefully shutdown Karaf 
> process using standard Karaf commands.
>
> This would be required because framework-level shutdown-sequence in Karaf 
> would give opportunity framework to properly
> execute bundle lifecycle listeners. What I mean is – abrupt kill can 
> potentially prevent lifecycle listeners from being
> properly executed and may also i

Re: [controller-dev] Best way to gracefully shutdown Karaf in ODL context

2017-10-12 Thread Muthukumaran K
Thanks a lot for the pointers Daniel and JamO.

https://git.opendaylight.org/gerrit/gitweb?p=releng/builder.git;a=blob;f=jjb/packaging/stop-odl.sh;h=2e3e7bf15dfbe6e59bddfbfd4ce4805fb47b2a69;hb=refs/heads/master#l27
 which aligns with my thought too .. ☺

Just a clarification .. had there been any situation which you could recollect 
where the karaf PID lingered abnormally long (beyond 10 – 15 mins) during stop 
phase ? Have seen this once using vanilla distro  but was never able to repro 
the same for past 1 month or so even after several day 2 day restarts. May it 
was an env issue locally. So, I was a bit reserved in rolling the approach of 
stop followed by waiting till PID vanishes into production

@Tom, @Robert,

Not directly related but I will fire away …

Erstwhile 
https://github.com/opendaylight/controller/blob/master/opendaylight/md-sal/sal-clustering-commons/src/main/java/org/opendaylight/controller/cluster/common/actor/QuarantinedMonitorActor.java
 used to restart the entire container and now on master Quarantined state just 
restarts the ActorSystem – is my understanding right ?

Regards
Muthu



From: Daniel Farrell [mailto:dfarr...@redhat.com]
Sent: Friday, October 13, 2017 6:19 AM
To: Jamo Luhrsen; Muthukumaran K; controller-dev@lists.opendaylight.org; 
integration-...@lists.opendaylight.org
Subject: Re: [controller-dev] Best way to gracefully shutdown Karaf in ODL 
context

Hey Muthu,

Yes, I think you should take a look at the systemd configuration we ship in 
ODL's packages. As far as I know it does a good job of 
starting/stopping/restarting ODL's service.

https://git.opendaylight.org/gerrit/gitweb?p=integration/packaging.git;a=blob;f=packages/rpm/unitfiles/opendaylight.service;h=ac436592d2880047986b856c7dd6810665ba0d3e;hb=refs/heads/master

Here's a Nitrogen RPM that contains that systemd config:

http://cbs.centos.org/repos/nfv7-opendaylight-70-release/x86_64/os/Packages/opendaylight-7.0.0-1.el7.noarch.rpm

This test job shows examples of `sudo systemctl [start, stop, status]` working:

https://jenkins.opendaylight.org/releng/job/packaging-test-rpm-master

The logic for that job is here:

https://git.opendaylight.org/gerrit/gitweb?p=releng/builder.git;a=blob;f=jjb/packaging/packaging.yaml;h=e4de235ca543506063b7fb57c3d257f0b983abe3;hb=refs/heads/master#l346

That systemd config is also exercised in tests for puppet-opendaylight, 
ansible-opendaylight, OPNFV Apex and other OPNFV installers.

It seems like you've put some good thought into this, so if you have any 
suggestions for things we can do better please let us know. :)

Daniel

On Thu, Oct 12, 2017 at 11:47 AM Jamo Luhrsen 
mailto:jluhr...@gmail.com>> wrote:
+Daniel and Integration-dev,

Daniel,

does our rpm package and the systemd work you did for it answer any of Muthu's
questions below? I'm assuming it *IS* the answer, but you will know better.

Thanks,
JamO

On 10/12/2017 04:56 AM, Muthukumaran K wrote:
> Hi,
>
> * *
>
> *Context* : Figuring out the best possible way to gracefully shutdown Karaf 
> process using standard Karaf commands.
>
> This would be required because framework-level shutdown-sequence in Karaf 
> would give opportunity framework to properly
> execute bundle lifecycle listeners. What I mean is – abrupt kill can 
> potentially prevent lifecycle listeners from being
> properly executed and may also impact any inflight transactions which may be 
> in various stages of replication and/or commit
> phases. This can in turn lead to troubles during recovery / restart phase.
>
>
>
> So, I thought of middle-ground where
>
> 1)  We execute karaf stop followed by
>
> 2)  Periodic check  if the last PID indeed terminates
>
>
>
> Doing a straight kill -9 could lead to rare heisenbugs during wherein 
> recovery could suffer since there may not be room for
> lifecycle listeners to execute (unless Karaf handles it as unified 
> shutdownhook and execute same path as that of stop or any
> graceful shutdown methods)
>
>
>
> Have anybody tried any better methods without side-effects ?
>
>
>
>
>
> *Option was tried and observation is as follows *
>
> Using Karaf stop followed by Karaf status command to check if the process has 
> come to a graceful termination. But, it appears
> that though ‘status’ command reports Karaf instance as ‘Not Running’, the PID 
> still lingers for 2 to 3 mins roughly in ODL
> context. I am biased to think that there are indeed some lifecycle listeners 
> executing … During this ‘PID lingering’ phase,
> the thread-dump hints the System Bundle Shutdown is waiting for the BP 
> container to shutdown the components (probably
> executing the lifecycle listeners at application and platform levels)
>
>
>
> "System Bundle Shutdown" #1582 daemon prio=5 os_prio=0 tid=0x7fb05003d800 
&

[controller-dev] Best way to gracefully shutdown Karaf in ODL context

2017-10-12 Thread Muthukumaran K
Hi,



Context : Figuring out the best possible way to gracefully shutdown Karaf 
process using standard Karaf commands.

This would be required because framework-level shutdown-sequence in Karaf would 
give opportunity framework to properly execute bundle lifecycle listeners. What 
I mean is - abrupt kill can potentially prevent lifecycle listeners from being 
properly executed and may also impact any inflight transactions which may be in 
various stages of replication and/or commit phases. This can in turn lead to 
troubles during recovery / restart phase.



So, I thought of middle-ground where

1)  We execute karaf stop followed by

2)  Periodic check  if the last PID indeed terminates



Doing a straight kill -9 could lead to rare heisenbugs during wherein recovery 
could suffer since there may not be room for lifecycle listeners to execute 
(unless Karaf handles it as unified shutdownhook and execute same path as that 
of stop or any graceful shutdown methods)



Have anybody tried any better methods without side-effects ?





Option was tried and observation is as follows

Using Karaf stop followed by Karaf status command to check if the process has 
come to a graceful termination. But, it appears that though 'status' command 
reports Karaf instance as 'Not Running', the PID still lingers for 2 to 3 mins 
roughly in ODL context. I am biased to think that there are indeed some 
lifecycle listeners executing ... During this 'PID lingering' phase, the 
thread-dump hints the System Bundle Shutdown is waiting for the BP container to 
shutdown the components (probably executing the lifecycle listeners at 
application and platform levels)


"System Bundle Shutdown" #1582 daemon prio=5 os_prio=0 tid=0x7fb05003d800 
nid=0xe68 waiting on condition [0x7faf77678000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe9064250> (a 
com.google.common.util.concurrent.AbstractFuture$Sync)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:268)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)
at 
org.opendaylight.openflowplugin.openflow.md.core.MDController.stop(MDController.java:358)
at 
org.opendaylight.openflowplugin.openflow.md.core.sal.OpenflowPluginProvider.close(OpenflowPluginProvider.java:121)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.aries.blueprint.utils.ReflectionUtils.invoke(ReflectionUtils.java:299)
at 
org.apache.aries.blueprint.container.BeanRecipe.invoke(BeanRecipe.java:980)
at 
org.apache.aries.blueprint.container.BeanRecipe.destroy(BeanRecipe.java:887)
at 
org.apache.aries.blueprint.container.BlueprintRepository.destroy(BlueprintRepository.java:329)
at 
org.apache.aries.blueprint.container.BlueprintContainerImpl.destroyComponents(BlueprintContainerImpl.java:765)
at 
org.apache.aries.blueprint.container.BlueprintContainerImpl.tidyupComponents(BlueprintContainerImpl.java:964)
at 
org.apache.aries.blueprint.container.BlueprintContainerImpl.destroy(BlueprintContainerImpl.java:909)
at 
org.apache.aries.blueprint.container.BlueprintExtender$3.run(BlueprintExtender.java:325)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.aries.blueprint.container.BlueprintExtender.destroyContainer(BlueprintExtender.java:346)
at 
org.apache.aries.blueprint.container.BlueprintExtender.access$400(BlueprintExtender.java:68)
at 
org.apache.aries.blueprint.container.BlueprintExtender$BlueprintContainerServiceImpl.destroyContainer(BlueprintExtender.java:624)
at 
org.opendaylight.controller.blueprint.BlueprintBundleTracker.shutdownAllContainers(BlueprintBundleTracker.java:251)
at 
org.opendaylight.controller.blueprint.BlueprintBundleTracker.bundleChanged(BlueprintBundleTracker.java:150)
at 
org.eclipse.osgi.framework.internal.core.BundleContextImpl.dispatchEvent(Bu

Re: [controller-dev] Expose Datastore health to applications via infrautils.diagstatus

2017-10-12 Thread Muthukumaran K
Thanks Tom. Then we would use the aggregate SyncStatus at Shard-Manager level.

If we need to further drill down at shard-level (I do not have a usecase 
readily for that though) we can use Shard Level MXBeans anyway for any manual 
troubleshooting

Regards
Muthu


From: Tom Pantelis [mailto:tompante...@gmail.com]
Sent: Thursday, October 12, 2017 1:05 PM
To: Muthukumaran K
Cc: Faseela K; infrautils-...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org; R Srinivasan E; Dayavanti Gopal Kamath
Subject: Re: [controller-dev] Expose Datastore health to applications via 
infrautils.diagstatus



On Thu, Oct 12, 2017 at 3:08 AM, Muthukumaran K 
mailto:muthukumara...@ericsson.com>> wrote:
Hi Tom,

While the initial status of the CDS is inferable using the aggregate 
SyncStatus, for dynamic status (eg. after startup, leader mobility in cluster 
due to load, availability scenarios like node-loss etc.), we were thinking of 
explicitly checking if all configured shards do have the leader or not (of 
course using the Shard Level MBeans).

But, from your mail, I understand that aggregate SyncStatus being set to false 
can be a more easier way to address dynamic changes post start instead of doing 
shardwise checking.

Is my understanding correct ?


That is correct. The shard will report a sync status change if it's a follower 
and the leader changes or if it goes to candidate. Of course if it's the 
leader, its sync status is automatically true. Also a follower shard will 
report it's not in sync if it lags behind the leader by a certain # of commits 
(default 10).

Regards
Muthu


From: 
controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>
 
[mailto:controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>]
 On Behalf Of Tom Pantelis
Sent: Thursday, October 12, 2017 12:28 PM
To: Faseela K
Cc: 
infrautils-...@lists.opendaylight.org<mailto:infrautils-...@lists.opendaylight.org>;
 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>;
 R Srinivasan E; Dayavanti Gopal Kamath
Subject: Re: [controller-dev] Expose Datastore health to applications via 
infrautils.diagstatus



On Wed, Oct 11, 2017 at 2:16 PM, Faseela K 
mailto:faseel...@ericsson.com>> wrote:
Hello controller-dev,

   We @ infrautils have developed a status-and-diagnostics framework, where 
applications can register their services,
   And report when they are functionally up. Northbound and Southbound 
interfaces for ODL can open-up and accept configurations,
   When all the required services are UP. As part of this, we were thinking if 
we can have a “DATASTORE” service, whose status can
   Be shown as “OPERATIONAL” when all the shards have properly elected their 
leaders. We do see that there are several MBeans  exposed by controller repo 
under 
org.opendaylight.controller:Category=Shards,name="++",type=DistributedConfigDatastore
  which can be used to derive the same information.
   Instead of doing that from outside, wanted to explore the possibility of 
integrating controller.sal-distributed-datastore with infrautils.diagstatus to 
report the status when the initial shard leader election is complete,
   And implement the dynamic poll interface to fetch the shard leader status at 
random points in time. Please share your thoughts.

This sounds like a reasonable idea.  CDS does have an aggregated shard sync 
status that is collected and reported by the ShardManager to the 
ShardManagerInfo MBean's SyncStatus attribute for each data store (eg 
type=DistributedConfigDatastore,Category=ShardManager,name=shard-manager-config).
 Once all shards report that they are "in sync" (ie a leader is elected and, if 
it's a follower, its journal is up-to-date with the leader),  the ShardManager 
sets the aggregate SyncStatus to true. Subsequently, if a shard loses its 
leader, the aggregate SyncStatus will be set to false.

I'm not really familiar with infrautils.diagstatus to know how exactly how this 
status would be reported to that component. This would also require the 
controller project to be dependent on infrautils - not sure if that would be OK?

Also, separate from SyncStatus, CDS blocks its blueprint startup until all 
shards have elected a leader  (up to 90 sec) so its OSGi services aren't 
advertised until then. Therefore all bundles that import those services will 
also be blocked on startup.



Thanks,
Faseela

___
controller-dev mailing list
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
https://lists.opendaylight.org/mailman/listinfo/controller-dev


___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Expose Datastore health to applications via infrautils.diagstatus

2017-10-12 Thread Muthukumaran K
Hi Tom,

While the initial status of the CDS is inferable using the aggregate 
SyncStatus, for dynamic status (eg. after startup, leader mobility in cluster 
due to load, availability scenarios like node-loss etc.), we were thinking of 
explicitly checking if all configured shards do have the leader or not (of 
course using the Shard Level MBeans).

But, from your mail, I understand that aggregate SyncStatus being set to false 
can be a more easier way to address dynamic changes post start instead of doing 
shardwise checking.

Is my understanding correct ?

Regards
Muthu


From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Tom Pantelis
Sent: Thursday, October 12, 2017 12:28 PM
To: Faseela K
Cc: infrautils-...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org; R Srinivasan E; Dayavanti Gopal Kamath
Subject: Re: [controller-dev] Expose Datastore health to applications via 
infrautils.diagstatus



On Wed, Oct 11, 2017 at 2:16 PM, Faseela K 
mailto:faseel...@ericsson.com>> wrote:
Hello controller-dev,

   We @ infrautils have developed a status-and-diagnostics framework, where 
applications can register their services,
   And report when they are functionally up. Northbound and Southbound 
interfaces for ODL can open-up and accept configurations,
   When all the required services are UP. As part of this, we were thinking if 
we can have a “DATASTORE” service, whose status can
   Be shown as “OPERATIONAL” when all the shards have properly elected their 
leaders. We do see that there are several MBeans  exposed by controller repo 
under 
org.opendaylight.controller:Category=Shards,name="++",type=DistributedConfigDatastore
  which can be used to derive the same information.
   Instead of doing that from outside, wanted to explore the possibility of 
integrating controller.sal-distributed-datastore with infrautils.diagstatus to 
report the status when the initial shard leader election is complete,
   And implement the dynamic poll interface to fetch the shard leader status at 
random points in time. Please share your thoughts.

This sounds like a reasonable idea.  CDS does have an aggregated shard sync 
status that is collected and reported by the ShardManager to the 
ShardManagerInfo MBean's SyncStatus attribute for each data store (eg 
type=DistributedConfigDatastore,Category=ShardManager,name=shard-manager-config).
 Once all shards report that they are "in sync" (ie a leader is elected and, if 
it's a follower, its journal is up-to-date with the leader),  the ShardManager 
sets the aggregate SyncStatus to true. Subsequently, if a shard loses its 
leader, the aggregate SyncStatus will be set to false.

I'm not really familiar with infrautils.diagstatus to know how exactly how this 
status would be reported to that component. This would also require the 
controller project to be dependent on infrautils - not sure if that would be OK?

Also, separate from SyncStatus, CDS blocks its blueprint startup until all 
shards have elected a leader  (up to 90 sec) so its OSGi services aren't 
advertised until then. Therefore all bundles that import those services will 
also be blocked on startup.



Thanks,
Faseela

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] OOM Bug 9034

2017-08-23 Thread Muthukumaran K
BZ 9034 :

(1) a lot of those "ERROR ShardDataTree 
org.opendaylight.controller.sal-distributed-datastore - 1.5.2.Carbon | 
member-0-shard-default-operational: Failed to commit transaction ... 
java.lang.IllegalStateException: Store tree 
org.opendaylight.yangtools.yang.data.api.schema.tree.spi.MaterializedContainerNode@78fe0203
 and candidate base org.opendaylight.yangtools.yang.data.api.schema.tree.spi
.MaterializedContainerNode@686861e8 differ" errors - seems vaguely familiar 
from recent list posts, someone remind me what were those that all about again?


The symptom seems similar to the one observed during PreLeader fix. But that 
was done long long back in Be itself. 

Michael, is any restart involved in testing ?

Regards
Muthu


-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Robert Varga
Sent: Wednesday, August 23, 2017 5:12 PM
To: Michael Vorburger; controller-dev; mdsal-...@lists.opendaylight.org
Subject: Re: [controller-dev] OOM Bug 9034

On 23/08/17 13:33, Robert Varga wrote:
>> As far as I can see, with my still very limited understanding of 
>> mdsal internals, this does not seem to be the same as our earlier
>> https://bugs.opendaylight.org/show_bug.cgi?id=8941 raised by Stephen 
>> and fixed by Robert (which is being follow-up on in
>> https://bugs.opendaylight.org/show_bug.cgi?id=9028 by Robert and Tom) 
>> - does this initial quick analysis seem accurate to you?
> 9034 looks like 8941, except it's not transactions, but chains. I'll 
> cook up a prototype.

Actually no. The backend side of things looks okay, as chains are being both 
closed and purged when requested from the frontend.

I suspect somebody is forgetting to close their transaction chains...

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Circuit Breaker timed out

2017-08-15 Thread Muthukumaran K
Sorry for the delay in response Srini.

We have not tried on master branch (Nitrogen  / Akka 2.5). Not sure if such an 
issue would go away with Akka 2.5 because the circuit breaker is primarily with 
LevelDB plugin.

For about 20 days, we have not been able to consistently reproduce this issue 
yet and seen this only once on one of the cluster nodes. We are using plain 
Ubuntu VMs to bring up the cluster.
‘dmesg’ also did not indicate any issues wrt disk.

Some “theoretical” candidates which we started suspecting were

a)  Compaction of LevelDB colliding with incoming writes – ie. if heavy 
compaction delays the incoming writes in LevelDB

b)  Difference in VM’s disk Vs Host’s Disk performance – we may have to 
beat this by doing heavy ‘dd’ on the disks to see if disk writes are slow at 
different level altogether

But yet to confirm on both. Consulting Akka google groups also did not help 
because most of the recommendation is that LevelDB is not meant for production. 
But, even if we use Cassandra (for example for journal persistence), timeout is 
logically still possible – perhaps Cassandra plugin handles such cases in 
better manner

Regards
Muthu



From: srini...@gmail.com [mailto:srini...@gmail.com] On Behalf Of Srini 
Seetharaman
Sent: Saturday, August 12, 2017 1:55 AM
To: Muthukumaran K
Cc: Tom Pantelis; controller-dev@lists.opendaylight.org
Subject: Re: [controller-dev] Circuit Breaker timed out

Or was there a real disk issue in that machine you were using?

On Fri, Aug 11, 2017 at 10:58 AM, Srini Seetharaman 
mailto:srini.seethara...@gmail.com>> wrote:
Muthu,
It's worrisome to hear that you've seen this too. Did it go away with Nitrogen 
or with moving to Akka 2.5 persistence?

I am referring to the following params within the persistence section of 
akka.conf


 circuit-breaker {

max-failures = 10

call-timeout = 10s

reset-timeout = 30s

  }


On Thu, Aug 10, 2017 at 10:17 PM, Muthukumaran K 
mailto:muthukumara...@ericsson.com>> wrote:
Hi Tom, Srini,

We have also noticed this with Boron very sporadically even without any 
explicit action taken on shard like Srini did

Srini,

Are you referring “journal-plugin-fallback” from 
http://doc.akka.io/docs/akka/current/scala/general/configuration.html#config-akka-persistence
 ?

Regards
Muthu

From: 
controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>
 
[mailto:controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>]
 On Behalf Of Srini Seetharaman
Sent: Friday, August 11, 2017 9:40 AM
To: Tom Pantelis
Cc: 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
Subject: Re: [controller-dev] Circuit Breaker timed out

Thanks Tom. I will investigate further on why the local disk operation failed. 
Seems strange though because I haven't seen anything in dmesg.

The default value for the call-timeout is 10s in akka.conf.

On Thu, Aug 10, 2017 at 3:20 PM, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:
That error is from  akka persistence. It happens if the backend persistence 
plugin doesn't respond back in time. I've only seen this in a CSIT environment 
whose disk activity was overloaded. The timeouts can be tweaked - I don't 
recall exactly what they are but you can find them in the akka docs (names 
contain circuit-breaker).

On Thu, Aug 10, 2017 at 6:01 PM, Srini Seetharaman 
mailto:srini.seethara...@gmail.com>> wrote:
Hi Tom,
In our ODL deployment that is running in standalone mode with operational store 
persistence enabled, we saw the following error being printed. Once the 
member-1-default-operational shard is shutdown, all write transactions after 
that fail and the system becomes unstable. At this point, we were probably 
doing less than 10 transactions per second. Any idea what is causing this? Has 
anyone seen this before?


2017-08-07 19:15:59,622 | ERROR | lt-dispatcher-23 | Shard  
  | 176 - com.typesafe.akka.slf4j - 2.4.7 | Failed to persist event type 
[org.opendaylight.controller.cluster.raft.ReplicatedLogImplEntry] with sequence 
number [9897493] for persistenceId [member-1-shard-default-operational].
akka.pattern.CircuitBreaker$$anon$1: Circuit Breaker Timed out.
2017-08-07 19:15:59,628 | INFO  | lt-dispatcher-24 | Shard  
  | 188 - 
org.opendaylight.controller.sa<http://org.opendaylight.controller.sa>l-akka-raft
 - 1.4.2.Boron-SR2 | Stopping Shard member-1-shard-default-operational
2017-08-07 19:15:59,629 | ERROR | lt-dispatcher-23 | 
LocalThreePhaseCommitCohort  | 193 - 
org.opendaylight.controller.sa<http://org.opendaylight.controller.sa>l-distributed-datastore
 - 1.4.2.Boron-SR2 | Failed to prepare transaction 
member-1-datastore-operational-fe-5-txn-791019 on backend
java.lang.RuntimeException: Transaction aborted due to shutdo

Re: [controller-dev] Circuit Breaker timed out

2017-08-10 Thread Muthukumaran K
Hi Tom, Srini,

We have also noticed this with Boron very sporadically even without any 
explicit action taken on shard like Srini did

Srini,

Are you referring “journal-plugin-fallback” from 
http://doc.akka.io/docs/akka/current/scala/general/configuration.html#config-akka-persistence
 ?

Regards
Muthu

From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Srini 
Seetharaman
Sent: Friday, August 11, 2017 9:40 AM
To: Tom Pantelis
Cc: controller-dev@lists.opendaylight.org
Subject: Re: [controller-dev] Circuit Breaker timed out

Thanks Tom. I will investigate further on why the local disk operation failed. 
Seems strange though because I haven't seen anything in dmesg.

The default value for the call-timeout is 10s in akka.conf.

On Thu, Aug 10, 2017 at 3:20 PM, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:
That error is from  akka persistence. It happens if the backend persistence 
plugin doesn't respond back in time. I've only seen this in a CSIT environment 
whose disk activity was overloaded. The timeouts can be tweaked - I don't 
recall exactly what they are but you can find them in the akka docs (names 
contain circuit-breaker).

On Thu, Aug 10, 2017 at 6:01 PM, Srini Seetharaman 
mailto:srini.seethara...@gmail.com>> wrote:
Hi Tom,
In our ODL deployment that is running in standalone mode with operational store 
persistence enabled, we saw the following error being printed. Once the 
member-1-default-operational shard is shutdown, all write transactions after 
that fail and the system becomes unstable. At this point, we were probably 
doing less than 10 transactions per second. Any idea what is causing this? Has 
anyone seen this before?


2017-08-07 19:15:59,622 | ERROR | lt-dispatcher-23 | Shard  
  | 176 - com.typesafe.akka.slf4j - 2.4.7 | Failed to persist event type 
[org.opendaylight.controller.cluster.raft.ReplicatedLogImplEntry] with sequence 
number [9897493] for persistenceId [member-1-shard-default-operational].
akka.pattern.CircuitBreaker$$anon$1: Circuit Breaker Timed out.
2017-08-07 19:15:59,628 | INFO  | lt-dispatcher-24 | Shard  
  | 188 - 
org.opendaylight.controller.sal-akka-raft
 - 1.4.2.Boron-SR2 | Stopping Shard member-1-shard-default-operational
2017-08-07 19:15:59,629 | ERROR | lt-dispatcher-23 | 
LocalThreePhaseCommitCohort  | 193 - 
org.opendaylight.controller.sal-distributed-datastore
 - 1.4.2.Boron-SR2 | Failed to prepare transaction 
member-1-datastore-operational-fe-5-txn-791019 on backend
java.lang.RuntimeException: Transaction aborted due to shutdown.
at 
org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator.abortPendingTransactions(ShardCommitCoordinator.java:399)[193:org.opendaylight.controller.sal-distributed-datastore:1.4.2.Boron-SR2]
at 
org.opendaylight.controller.cluster.datastore.Shard.postStop(Shard.java:211)[193:org.opendaylight.controller.sal-distributed-datastore:1.4.2.Boron-SR2]
at 
akka.actor.Actor$class.aroundPostStop(Actor.scala:494)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.persistence.UntypedPersistentActor.akka$persistence$Eventsourced$$super$aroundPostStop(PersistentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7]
at 
akka.persistence.Eventsourced$class.aroundPostStop(Eventsourced.scala:223)[181:com.typesafe.akka.persistence:2.4.7]
at 
akka.persistence.UntypedPersistentActor.aroundPostStop(PersistentActor.scala:168)[181:com.typesafe.akka.persistence:2.4.7]
at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.actor.dungeon.FaultHandling$class.handleChildTerminated(FaultHandling.scala:293)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.actor.ActorCell.handleChildTerminated(ActorCell.scala:374)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.actor.dungeon.DeathWatch$class.watchedActorTerminated(DeathWatch.scala:61)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.actor.ActorCell.watchedActorTerminated(ActorCell.scala:374)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.actor.ActorCell.invokeAll$1(ActorCell.scala:460)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:260)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.dispatch.Mailbox.run(Mailbox.scala:224)[175:com.typesafe.akka.actor:2.4.7]
at 
akka.dispatch.Mailbox.exec(Mailbox.scala:234)[175:com.typesafe.akka.actor:2.4.7]
at 
scala.concur

Re: [controller-dev] Testing the FE BE separation of Datastore (Bug-5280)

2017-08-08 Thread Muthukumaran K
Hi Robert / Vratko, 

Going through Vratko's gerrit, I noticed following 

"Prefix-based shards do not even support "non-chained" transactions"

I assume that this caveat is applicable only for the prefix-based sharding and 
does not necessarily hold good for those who continue to use module-based 
sharding. Am I correct ?
The reason why I ask is we wanted to run Netvirt CSIT with tell-based protocol 
enabled to have a basic sanity cleared with this change and then do more 
focused testing with HA and scale scenarios. 

Regards
Muthu





-Original Message-
From: Muthukumaran K 
Sent: Thursday, August 03, 2017 10:51 AM
To: 'Robert Varga'; controller-dev
Cc: odl netvirt dev
Subject: RE: [controller-dev] Testing the FE BE separation of Datastore 
(Bug-5280)

Thanks Robert. With Tom's response on 'use-tell-based-protocol' config, I 
started taking a peek at the changes required. 
Was not aware of Vratko's gerrit. Will take a look at that too to sink it in 

Regards
Muthu


-Original Message-
From: Robert Varga [mailto:n...@hq.sk]
Sent: Wednesday, August 02, 2017 11:29 PM
To: Muthukumaran K; controller-dev
Cc: odl netvirt dev
Subject: Re: [controller-dev] Testing the FE BE separation of Datastore 
(Bug-5280)

On 24/07/17 17:28, Muthukumaran K wrote:
> Hi Robert,

Hello Muthu,

sorry for the late response, this email got left behind in my drafts folder.

> This is in context of the changes for  BZ-5280. We would like to test 
> the changes on master branch. Have some clarifications on the same
> 
>  
> 
> a)  Since this change had been major, as I recollect, there was a
> discussion earlier on isolating the changed code-path and old code path.
> Is the new code-path enabled by default on master ? To utilize changed 
> code-path,

As Tom already responded, this work is not active by default for the APIs 
currently used by our downstreams. It is used for fine-grained shards.

> 
> a)  any specific configuration changes in any cfg files (eg.
> Datastore cfg) required or

Just uncomment

#use-tell-based-protocol=true

in datastore.cfg before you start the controller. You should get an INFO 
message when it is enabled.

> b)  a new type of Databroker to be used to use the changed codepath
> ? In case of new databroker type, if there is any sample usage, can we 
> get some pointer for the same ?

This is an internal DS implementation detail, so no changes to wiring are 
necessary.

> b)  Can the changes be tested for specific AskTimeout scenarios -
> encountered earlier, like
> 
> a)  scaled transactions with and without Transaction Chain (if Txn
> Chain is used, do we still have to use pingpong broker for better 
> results ?) of course with appropriate heap-sizing and using G1GC

Yes. I would suggest testing both, though.

> b)  split-brain healing with medium volume of config-data -
> specifically , full split (eg. Downing node interfaces and bringing 
> them back up after brief period) and heal of cluster with and without 
> ongoing transactions. For completion sake, we can also test partial 
> split and heal
> 
>  
> 
> Any other specific variances of above scenario which could put the 
> changes to test ?

Idle split-brain recovery should not be affected. With tell-based protocol 
enabled, ongoing transactions should recover seamlessly if the partition heals 
reasonably quickly (2 minutes by default).

Vratko is writing up the test scenarios that are being executed somewhere. A 
progress report is at https://git.opendaylight.org/gerrit/56454. It would be 
nice if the test cases could be executed in an environment more stable than our 
CSIT substrate.

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Testing the FE BE separation of Datastore (Bug-5280)

2017-08-02 Thread Muthukumaran K
Thanks Robert. With Tom's response on 'use-tell-based-protocol' config, I 
started taking a peek at the changes required. 
Was not aware of Vratko's gerrit. Will take a look at that too to sink it in 

Regards
Muthu


-Original Message-
From: Robert Varga [mailto:n...@hq.sk] 
Sent: Wednesday, August 02, 2017 11:29 PM
To: Muthukumaran K; controller-dev
Cc: odl netvirt dev
Subject: Re: [controller-dev] Testing the FE BE separation of Datastore 
(Bug-5280)

On 24/07/17 17:28, Muthukumaran K wrote:
> Hi Robert,

Hello Muthu,

sorry for the late response, this email got left behind in my drafts folder.

> This is in context of the changes for  BZ-5280. We would like to test 
> the changes on master branch. Have some clarifications on the same
> 
>  
> 
> a)  Since this change had been major, as I recollect, there was a
> discussion earlier on isolating the changed code-path and old code path.
> Is the new code-path enabled by default on master ? To utilize changed 
> code-path,

As Tom already responded, this work is not active by default for the APIs 
currently used by our downstreams. It is used for fine-grained shards.

> 
> a)  any specific configuration changes in any cfg files (eg.
> Datastore cfg) required or

Just uncomment

#use-tell-based-protocol=true

in datastore.cfg before you start the controller. You should get an INFO 
message when it is enabled.

> b)  a new type of Databroker to be used to use the changed codepath
> ? In case of new databroker type, if there is any sample usage, can we 
> get some pointer for the same ?

This is an internal DS implementation detail, so no changes to wiring are 
necessary.

> b)  Can the changes be tested for specific AskTimeout scenarios -
> encountered earlier, like
> 
> a)  scaled transactions with and without Transaction Chain (if Txn
> Chain is used, do we still have to use pingpong broker for better 
> results ?) of course with appropriate heap-sizing and using G1GC

Yes. I would suggest testing both, though.

> b)  split-brain healing with medium volume of config-data -
> specifically , full split (eg. Downing node interfaces and bringing 
> them back up after brief period) and heal of cluster with and without 
> ongoing transactions. For completion sake, we can also test partial 
> split and heal
> 
>  
> 
> Any other specific variances of above scenario which could put the 
> changes to test ?

Idle split-brain recovery should not be affected. With tell-based protocol 
enabled, ongoing transactions should recover seamlessly if the partition heals 
reasonably quickly (2 minutes by default).

Vratko is writing up the test scenarios that are being executed somewhere. A 
progress report is at https://git.opendaylight.org/gerrit/56454. It would be 
nice if the test cases could be executed in an environment more stable than our 
CSIT substrate.

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] Testing the FE BE separation of Datastore (Bug-5280)

2017-07-24 Thread Muthukumaran K
Hi Robert,

This is in context of the changes for  BZ-5280. We would like to test the 
changes on master branch. Have some clarifications on the same


a)  Since this change had been major, as I recollect, there was a 
discussion earlier on isolating the changed code-path and old code path. Is the 
new code-path enabled by default on master ? To utilize changed code-path,

a)  any specific configuration changes in any cfg files (eg. Datastore cfg) 
required or

b)  a new type of Databroker to be used to use the changed codepath ? In 
case of new databroker type, if there is any sample usage, can we get some 
pointer for the same ?



b)  Can the changes be tested for specific AskTimeout scenarios - 
encountered earlier, like

a)  scaled transactions with and without Transaction Chain (if Txn Chain is 
used, do we still have to use pingpong broker for better results ?) of course 
with appropriate heap-sizing and using G1GC

b)  split-brain healing with medium volume of config-data - specifically , 
full split (eg. Downing node interfaces and bringing them back up after brief 
period) and heal of cluster with and without ongoing transactions. For 
completion sake, we can also test partial split and heal

Any other specific variances of above scenario which could put the changes to 
test ?

Regards
Muthu



___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] Cassandra plugin for Akka Persistence

2017-06-28 Thread Muthukumaran K
Hi,

Anybody have tried using Casssandra plugin (v 0.7 with datastax driver v 3.2.0 
combination) for Akka persistence on ODL master since LevelDB is mainly for dev 
purposes and not 'recommended' for production ?
Planning to use Cassandra 3.7 for basic testing. Did anybody have success with 
any earlier versions of Cassandra eg. 2.x ?


Regards
Muthu



___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [mdsal-dev] Bug 7370 OOM due to suspected memory leak in akka.dispatch.Dispatcher found by hprof

2017-05-29 Thread Muthukumaran K
Regarding Netty ‘s io.netty.util.concurrent.FastThreadLocalThread observation

https://github.com/netty/netty/issues/6565

Issue seems very recent. Looking further what could be the reason

Regards
Muthu


From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Tom Pantelis
Sent: Monday, May 29, 2017 7:30 PM
To: Michael Vorburger
Cc: controller-dev; ; 
mdsal-...@lists.opendaylight.org; openflowplugin-dev
Subject: Re: [controller-dev] [mdsal-dev] Bug 7370 OOM due to suspected memory 
leak in akka.dispatch.Dispatcher found by hprof


yeah that looks like an issue. DeviceInitializationUtils is doing a blocking 
get on a Future which is usually not a good thing. And it occurred via an EOS 
data change and is blocking an akka Dispatcher thread.

On a side note, there's a lot of threads with 
io.netty.util.concurrent.FastThreadLocalThread - not sure if that's normal.



On Mon, May 29, 2017 at 9:30 AM, Michael Vorburger 
mailto:vorbur...@redhat.com>> wrote:
+openflowplugin-dev & +ovsdb-dev:

Tom,

On Mon, May 29, 2017 at 2:57 PM, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:
Thanks a lot for replying, really appreciate it!

It looks like the Dispatcher was for data change notifications. I suspect a 
listener was hung or responding slowly so the actor's mailbox filled up with 
change notifications. I would suggest getting a thread dump next time.

Turn out no need to wait for next time - just figured out that we can obtain 
thread dumps à posteriori from an HPROF using MAT... see the [4] 
Bug7370_Threads.zip HTML report just attached to Bug 7370.

It shows 604 threads (a lot?), many of which are e.g. parked ForkJoinPool, and 
a number of them related to ovsdb and openflowplugin stuff... so what are we 
looking for, in this thread dump? I haven't looked thread each thread's stack 
yet, but this one vaguely looks like what you may mean by "a listener was hung 
or responding slowly" (causing "the actor's mailbox filled upwith change 
notifications"), could it possibly be the reason for / having something to do 
with this OOM:

opendaylight-cluster-data-akka.actor.default-dispatcher-16

  at sun.misc.Unsafe.park(ZJ)V (Native Method)

  at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V 
(LockSupport.java:175)

  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()Z 
(AbstractQueuedSynchronizer.java:836)

  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(I)V
 (AbstractQueuedSynchronizer.java:997)

  at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(I)V
 (AbstractQueuedSynchronizer.java:1304)

  at 
com.google.common.util.concurrent.AbstractFuture$Sync.get()Ljava/lang/Object; 
(AbstractFuture.java:285)

  at com.google.common.util.concurrent.AbstractFuture.get()Ljava/lang/Object; 
(AbstractFuture.java:116)

  at 
org.opendaylight.openflowplugin.impl.util.DeviceInitializationUtils.initializeNodeInformation(Lorg/opendaylight/openflowplugin/api/openflow/device/DeviceContext;ZLorg/opendaylight/openflowplugin/openflow/md/core/sal/convertor/ConvertorExecutor;)V
 (DeviceInitializationUtils.java:155)

  at 
org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.onContextInstantiateService(Lorg/opendaylight/openflowplugin/api/openflow/connection/ConnectionContext;)Z
 (DeviceContextImpl.java:730)

  at 
org.opendaylight.openflowplugin.impl.lifecycle.LifecycleServiceImpl.instantiateServiceInstance()V
 (LifecycleServiceImpl.java:53)

  at 
org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceRegistrationDelegator.instantiateServiceInstance()V
 (ClusterSingletonServiceRegistrationDelegator.java:46)

  at 
org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.takeOwnership()V
 (ClusterSingletonServiceGroupImpl.java:291)

  at 
org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ownershipChanged(Lorg/opendaylight/mdsal/eos/common/api/GenericEntityOwnershipChange;)V
 (ClusterSingletonServiceGroupImpl.java:237)

  at 
org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.ownershipChanged(Lorg/opendaylight/mdsal/eos/common/api/GenericEntityOwnershipChange;)V
 (AbstractClusterSingletonServiceProviderImpl.java:145)

  at 
org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl.ownershipChanged(Lorg/opendaylight/mdsal/eos/dom/api/DOMEntityOwnershipChange;)V
 (DOMClusterSingletonServiceProviderImpl.java:23)

  at 
org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.onEntityOwnershipChanged(Lorg/opendaylight/mdsal/eos/dom/api/DOMEntityOwnershipChange;)V
 (EntityOwnershipListenerActor.java:46)

  at 
org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.handleReceive(Ljava/lang/Object;)V
 (EntityOwnershipListenerActor.java:36)

  at 
org.opendaylight.controller.c

Re: [controller-dev] [mdsal-dev] Bug 7370 OOM due to suspected memory leak in akka.dispatch.Dispatcher found by hprof

2017-05-29 Thread Muthukumaran K
I agree with Tom.

Had experienced this earlier due to a laggard listener back before stable 
carbon cut. Eventually , listener had got fixed (not in connection with this 
observation but I had just picked up the build spaced well after the original 
build) and I had not been able to repro the same.

Listener mailboxes are unbounded so that notifs do not get lost due to discards 
to deadletters

Regards
Muthu




From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Tom Pantelis
Sent: Monday, May 29, 2017 6:28 PM
To: Michael Vorburger
Cc: controller-dev; mdsal-...@lists.opendaylight.org
Subject: Re: [controller-dev] [mdsal-dev] Bug 7370 OOM due to suspected memory 
leak in akka.dispatch.Dispatcher found by hprof

It looks like the Dispatcher was for data change notifications. I suspect a 
listener was hung or responding slowly so the actor's mailbox filled up with 
change notifications. I would suggest getting a thread dump next time.

On Mon, May 29, 2017 at 7:52 AM, Michael Vorburger 
mailto:vorbur...@redhat.com>> wrote:
Hi guys,
I just ran MAT([1]) over an HPROF heap dump on OOM in Bug 7370, and it (MAT) 
raises a "leak suspect" in akka.dispatch.Dispatcher - see the [3] 
java_pid19570_Leak_Suspects.zip just attached to Bug 7370 ... questions:
Is this perhaps something you jump at with an "ah that, we know about it and 
already fixed that in ..." ?

If not, how do we go about better understanding the root cause of this, and be 
able to eventually fix this?

My underlying assumption here is that isn't "normal" and not just "by design" - 
if it is, I'd love some education... like I'm hoping that the conclusion here 
isn't simply that MD SAL's data store is a dumb in-memory data base which 
basically just takes a huge amount of GBs to keep (all) YANG model instances on 
the heap - or is it?

Tx,
M.

[1] https://www.eclipse.org/mat/

[2] https://bugs.opendaylight.org/show_bug.cgi?id=7370

[3] https://bugs.opendaylight.org/attachment.cgi?id=1816
--
Michael Vorburger, Red Hat
vorbur...@redhat.com | IRC: vorburger @freenode | 
~ = http://vorburger.ch

___
mdsal-dev mailing list
mdsal-...@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/mdsal-dev

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Backward compatibility of akka-persistence journal

2017-03-29 Thread Muthukumaran K
Hi Srini,

Actually changing revision-date of the recommended practices of RFC 6020 in 
general - https://tools.ietf.org/html/rfc6020#section-10

But, could you please elaborate how referring revision in module-shards.conf 
will be useful in upgradeability context ?

I have noticed that models are changed *without* bumping the revision dates.
As I could think, one main reason for inhibition on changing revision-date 
religiously across model-changes is that binding classes generated from yang 
models include the revision in the package name – eg. *.revABCD.*. If 
revision-date gets changed at model level, generated packages also will change 
and subsequently all modules who use the generated binding classes for the 
corresponding yang-model would have change their imports.

Regards
Muthu


From: srini...@gmail.com [mailto:srini...@gmail.com] On Behalf Of Srini 
Seetharaman
Sent: Wednesday, March 29, 2017 7:31 AM
To: Tom Pantelis
Cc: Muthukumaran K; controller-dev@lists.opendaylight.org
Subject: Re: [controller-dev] Backward compatibility of akka-persistence journal

When using module level sharding, it will be good if we can mention the 
revision-date for the module in module-shard.conf. That will significantly help 
with model upgrading for production controllers. Hope that can be considered 
for Carbon.

On Fri, Mar 24, 2017 at 7:12 AM, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:
Since anything not tied to a specific shard goes into default, there could be 
contention if multiple yang modules get hit with high volume. Using specific 
shards alleviates that and also allows for more granularity wrt shard-specifc 
settings like persistence and replication. Eg, you might want one yang module 
to be replicated and another local-only.

On Fri, Mar 24, 2017 at 10:04 AM, Srini Seetharaman 
mailto:srini.seethara...@gmail.com>> wrote:
Thanks Muthu and Tom. So, for the case of my migration at this point,
it feels like the easiest way is to retain the default sharding
instead of defining my per-module sharding. Is there any performance
downside to just using the default shard?

On Fri, Mar 24, 2017 at 3:54 AM, Muthukumaran K
mailto:muthukumara...@ericsson.com>> wrote:
> Data import / export is a new feature in master branch which can export
> datastore contents in JSON format and also import the same. Since this
> feature is not there in earlier releases and would not be usable for moving
> data from Be to Bo.
>
>
>
> Regards
>
> Muthu
>
>
>
>
>
> From: 
> controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>
> [mailto:controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>]
>  On Behalf Of Tom
> Pantelis
> Sent: Friday, March 24, 2017 4:13 PM
> To: Srini Seetharaman
> Cc: 
> controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
> Subject: Re: [controller-dev] Backward compatibility of akka-persistence
> journal
>
>
>
> There isn't any cluster-admin RPCs to defined new shards and migrate data.
> You'd have to capture the data via REST from Beryllium and re-write it.
> There is also a data import/export project but I'm not really familiar with
> it.
>
>
>
> On Fri, Mar 24, 2017 at 1:06 AM, Srini Seetharaman
> mailto:srini.seethara...@gmail.com>> wrote:
>
>> "instead of just relying on the default" - I assume you're referring to
>> the default shard. All yang modules for which there isn't a shard specified
>> in the .conf files are stored in the default shard. I suspect in your case
>> the previous journal backup had the yang module in question stored in the
>> default shard. However you had a specific shard defined in the .conf files
>> so it went to that shard to read the data. The data in the default shard
>> still exists and was restored but it just can't be accessed b/c reads/writes
>> go to the specific shard.
>
> Totally explains what I'm going through.
>
> Is there a way to port data over from my older Beryllium controller
> backup, which used the default shard, to my new Boron controller that
> uses module-specific shards? Can I perform the restore first, and then
> use the cluster-admin RPC to create the shards to move data over from
> the default to the module-specific shard?
>
>


___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Backward compatibility of akka-persistence journal

2017-03-24 Thread Muthukumaran K
Data import / export is a new feature in master branch which can export 
datastore contents in JSON format and also import the same. Since this feature 
is not there in earlier releases and would not be usable for moving data from 
Be to Bo.

Regards
Muthu


From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Tom Pantelis
Sent: Friday, March 24, 2017 4:13 PM
To: Srini Seetharaman
Cc: controller-dev@lists.opendaylight.org
Subject: Re: [controller-dev] Backward compatibility of akka-persistence journal

There isn't any cluster-admin RPCs to defined new shards and migrate data. 
You'd have to capture the data via REST from Beryllium and re-write it. There 
is also a data import/export project but I'm not really familiar with it.

On Fri, Mar 24, 2017 at 1:06 AM, Srini Seetharaman 
mailto:srini.seethara...@gmail.com>> wrote:
> "instead of just relying on the default" - I assume you're referring to the 
> default shard. All yang modules for which there isn't a shard specified in 
> the .conf files are stored in the default shard. I suspect in your case the 
> previous journal backup had the yang module in question stored in the default 
> shard. However you had a specific shard defined in the .conf files so it went 
> to that shard to read the data. The data in the default shard still exists 
> and was restored but it just can't be accessed b/c reads/writes go to the 
> specific shard.

Totally explains what I'm going through.

Is there a way to port data over from my older Beryllium controller
backup, which used the default shard, to my new Boron controller that
uses module-specific shards? Can I perform the restore first, and then
use the cluster-admin RPC to create the shards to move data over from
the default to the module-specific shard?

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Backward compatibility of akka-persistence journal

2017-03-24 Thread Muthukumaran K
Hi Tom,

I assume using backup and restore for below scenario would still have the same 
issue right ? As I understand, Backup and Restore may make the content agnostic 
to the node-members but the data backed-up will still have the shard-level 
metadata as part of backed-up data which can impact restoration - I am assuming 
static shard configuration in this context

Regards
Muthu


From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Tom Pantelis
Sent: Friday, March 24, 2017 10:27 AM
To: Srini Seetharaman
Cc: controller-dev@lists.opendaylight.org
Subject: Re: [controller-dev] Backward compatibility of akka-persistence journal

yes - module-shard.conf and modules.conf are still used, in Carbon as well. 
They specify the static shard and member configuration. The cluster admin RPCs 
can be used to dynamically add/remove shard replicas at which point the shard 
memberships are stored in the journal and the static configuration is no longer 
used.

"instead of just relying on the default" - I assume you're referring to the 
default shard. All yang modules for which there isn't a shard specified in the 
.conf files are stored in the default shard. I suspect in your case the 
previous journal backup had the yang module in question stored in the default 
shard. However you had a specific shard defined in the .conf files so it went 
to that shard to read the data. The data in the default shard still exists and 
was restored but it just can't be accessed b/c reads/writes go to the specific 
shard.

On Fri, Mar 24, 2017 at 12:37 AM, Srini Seetharaman 
mailto:srini.seethara...@gmail.com>> wrote:
My bad. The restore works fine once I remove the entries I made in 
module-shard.conf for the specific modules I am using. For some reason, having 
an entry blocks the restore.

With Beryllium and Boron, do we still use the module-shard.conf and 
modules.conf? Is there a doc that gives more info on when to use it instead of 
just relying on the default?

On Sun, Mar 19, 2017 at 10:29 AM, Srini Seetharaman 
mailto:srini.seethara...@gmail.com>> wrote:
Thanks Tom for the quick reply. Nothing shows up in the config datastore on 
warm restart with the old journal+snashot. I didn't see any error either. I can 
turn on debug mode and check. I'll also try the online restore and let you know.

From this doc  
http://doc.akka.io/docs/akka/2.4/project/migration-guide-persistence-experimental-2.3.x-2.4.x.html
 it seemed no binary compatibility is offered for the journal.


On Sunday, March 19, 2017, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:
What doesn't work exactly? Is there an error? From what I recall I thought they 
were compatible wrt the journal schema but I could be wrong.

You could use the online backup/restore.

On Sun, Mar 19, 2017 at 12:50 PM, Srini Seetharaman 
mailto:srini.seethara...@gmail.com>> wrote:
Hi, I recently switched to using stable/boron that uses 2.4.7 akka version. I 
have an old backup of the journal and snapshot from beryllium-sr3. I noticed 
that restore of this backup to a boron cluster doesn't work.

Perhaps because 2.3 experimental akka-persistence  is not compatible with 2.4 
akka-persistence. I wanted to confirm this. I also wanted to see if anyone has 
any ideas or workaround to import a journal+snapshot from beryllium onto boron.

Thanks much!

Srini.

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev



___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] EOS entity without an owner after data store exception

2017-03-22 Thread Muthukumaran K
Thanks for the update Siva

Regards
Muthu

From: Sivasamy Kaliappan [mailto:sivasa...@gmail.com]
Sent: Wednesday, March 22, 2017 6:03 PM
To: Tom Pantelis
Cc: Muthukumaran K; controller-dev
Subject: Re: [controller-dev] EOS entity without an owner after data store 
exception

We are not seeing this issue after upgrading to Boron. Thanks!

On Thu, Mar 2, 2017 at 3:59 PM, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:
yes - I didn't realize it was Beryllium. I would suggest trying Boron - 
PreLeader fix as you mentioned plus a lot of hardening in EOS.

On Wed, Mar 1, 2017 at 11:35 PM, Muthukumaran K 
mailto:muthukumara...@ericsson.com>> wrote:
Hi Tom,

From the stacktrace it appears as Beryllium SR3 - 1.3.3.Beryllium-SR3
As I recollect, this appears as symptom which called for pre-leader fix. So, 
would it be better to try out Boron ?

Regards
Muthu


From: 
controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>
 
[mailto:controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>]
 On Behalf Of Tom Pantelis
Sent: Wednesday, March 01, 2017 9:03 PM
To: Sivasamy Kaliappan mailto:sivasa...@gmail.com>>
Cc: controller-dev 
mailto:controller-dev@lists.opendaylight.org>>
Subject: Re: [controller-dev] EOS entity without an owner after data store 
exception

Please open a bug. If it's reproducible, it would be helpful to enable debug 
for org.opendaylight.controller.cluster.datastore.entityownership on each node 
and provide the log files.

On Wed, Mar 1, 2017 at 4:34 AM, Sivasamy Kaliappan 
mailto:sivasa...@gmail.com>> wrote:
All,

We have a 3 node cluster and an entity A defined. On startup all the nodes in 
cluster creates the entity and register the same. Following is the sequence of 
events:

  1.  During cluster startup an owner for the entity A is elected and all 
listeners are informed
  2.  A leader for entity ownership shard is elected (different from entity 
leader)
  3.  After this election I am seeing below data store exception in the newly 
elected shard leader
  4.  After this exception I am getting an owner changed event where 
hasOwner=false. i.e currently there no owner for this entity and it remains in 
this state forever
  5.  
org.opendaylight.controller.cl<http://org.opendaylight.controller.cl>uster.datastore.Shard.finishCommit()
 method has a comment that during edge cases data store will throw 
IllegalStateException
Is this behavior expected? What should we do when there is no owner for the 
entity in the cluster?

No Owner Event:

2017-01-27 22:13:25,920 | INFO  | lt-dispatcher-16 | EntityOwnerChangeListener  
 | 283 - com.aaa.odl - 0.1.0.SNAPSHOT | 
ownershipChanged:request,handleOwnershipChanged: EntityOwnershipChanged 
[entity=Entity{type='controller', 
id=/(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)entity/entity[{(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)name=controller}]},
 wasOwner=false, isOwner=false, hasOwner=false, inJeopardy=false] event 
received for entity Entity{type='controller', 
id=/(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)entity/entity[{(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)name=controller}]}


Exception:

2017-01-27 22:13:25,787 | WARN  | lt-dispatcher-17 | EntityOwnershipShard   
  | 140 - 
org.opendaylight.controller.sa<http://org.opendaylight.controller.sa>l-akka-raft
 - 1.3.3.Beryllium-SR3 | member-2-shard-entity-ownership-operational: commit 
failed for transaction member-2-txn-2-1485584005749 - retrying as foreign 
candidate
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Store 
tree 
org.opendaylight.yangtools.yang.data.api.schema.tree.spi.MaterializedContainerNode@3316bc82<mailto:org.opendaylight.yangtools.yang.data.api.schema.tree.spi.MaterializedContainerNode@3316bc82>
 and candidate base 
org.opendaylight.yangtools.yang.data.api.schema.tree.spi.MaterializedContainerNode@7a30a559<mailto:org.opendaylight.yangtools.yang.data.api.schema.tree.spi.MaterializedContainerNode@7a30a559>
 differ.
at 
com.google.common.util.concurrent.Futures$ImmediateFailedFuture.get(Futures.java:190)[37:com.google.guava:18.0.0]
at 
org.opendaylight.controller.cl<http://org.opendaylight.controller.cl>uster.datastore.ShardCommitCoordinator$CohortEntry.commit(ShardCommitCoordinator.java:670)[143:org.opendaylight.controller.sal-distributed-datastore:1.3.3.Beryllium-SR3]
at 
org.opendaylight.controller.cl<http://org.opendaylight.controller.cl>uster.datastore.Shard.finishCommit(Shard.java:352)[143:org.opendaylight.controller.sal-distributed-datastore:1.3.3.Beryllium-SR3]
at 
org.opendayligh

Re: [controller-dev] prevLogIndex 3 was found in the log but the term -1 is not equal to the append entriesprevLogTerm 1 - lastIndex: 5, snapshotIndex: 4

2017-03-20 Thread Muthukumaran K
@Guy,

URL should be

http://:8181/jolokia/read/org.opendaylight.controller:Category=Shards,name=member--shard-topology-operational,type=DistributedOperationalDatastore

Regards
Muthu



From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Monday, March 20, 2017 7:24 PM
To: Tom Pantelis 
Cc: odl netvirt dev ; 
controller-dev@lists.opendaylight.org
Subject: Re: [controller-dev] prevLogIndex 3 was found in the log but the term 
-1 is not equal to the append entriesprevLogTerm 1 - lastIndex: 5, 
snapshotIndex: 4

https://bugs.opendaylight.org/show_bug.cgi?id=8022

The dump from  
http://:8181/jolokia/read/org.opendaylight.controller:Category=Shards,name=member--shard-topology-operational,type=DistributedOperDatastore didn’t work for me.
So I only added the logs with TRACE on SHARDs.


From: Tom Pantelis [mailto:tompante...@gmail.com]
Sent: Monday, March 20, 2017 2:41 PM
To: Sela, Guy mailto:guy.s...@hpe.com>>
Cc: 
controller-dev@lists.opendaylight.org;
 odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>
Subject: Re: [controller-dev] prevLogIndex 3 was found in the log but the term 
-1 is not equal to the append entriesprevLogTerm 1 - lastIndex: 5, 
snapshotIndex: 4

Actually set the log level to trace instead of debug.

On Mon, Mar 20, 2017 at 8:29 AM, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:
Hmm... please enable Shard debug (add 
log4j.logger.org.opendaylight.controller.cluster.datastore.Shard=debug to 
etc/org.ops4j.pax.logging.cfg) on all 3 nodes and let it run for like 10 sec 
and capture the logs and open a bug. Also capture the JMX Shard output from 
each node, ie:

   
http://:8181/jolokia/read/org.opendaylight.controller:Category=Shards,name=member--shard-topology-operational,type=DistributedOperDatastore

replacing  appropriately for each node.

Please provide the steps that led up to it, eg were nodes restarted etc

On Mon, Mar 20, 2017 at 7:50 AM, Sela, Guy 
mailto:guy.s...@hpe.com>> wrote:
Latest code in a 3-node cluster, and I’m seeing the following errors while OVSs 
are connecting to the ODL.

2017-03-20 13:48:36,101 | INFO  | lt-dispatcher-29 | Shard  
  | 207 - 
org.opendaylight.controller.sal-clustering-commons
 - 1.5.0.SNAPSHOT | member-3-shard-topology-operational (Follower): The 
prevLogIndex 3 was found in the log but the term -1 is not equal to the append 
entriesprevLogTerm 1 - lastIndex: 5, snapshotIndex: 4
2017-03-20 13:48:36,101 | INFO  | lt-dispatcher-29 | Shard  
  | 207 - 
org.opendaylight.controller.sal-clustering-commons
 - 1.5.0.SNAPSHOT | member-3-shard-topology-operational (Follower): Follower is 
out-of-sync so sending negative reply: AppendEntriesReply [term=1, 
success=false, followerId=member-3-shard-topology-operational, logLastIndex=5, 
logLastTerm=1, forceInstallSnapshot=false, payloadVersion=5, raftVersion=3]
2017-03-20 13:48:36,621 | INFO  | lt-dispatcher-51 | Shard  
  | 207 - 
org.opendaylight.controller.sal-clustering-commons
 - 1.5.0.SNAPSHOT | member-3-shard-topology-operational (Follower): The 
prevLogIndex 3 was found in the log but the term -1 is not equal to the append 
entriesprevLogTerm 1 - lastIndex: 5, snapshotIndex: 4
2017-03-20 13:48:36,621 | INFO  | lt-dispatcher-51 | Shard  
  | 207 - 
org.opendaylight.controller.sal-clustering-commons
 - 1.5.0.SNAPSHOT | member-3-shard-topology-operational (Follower): Follower is 
out-of-sync so sending negative reply: AppendEntriesReply [term=1, 
success=false, followerId=member-3-shard-topology-operational, logLastIndex=5, 
logLastTerm=1, forceInstallSnapshot=false, payloadVersion=5, raftVersion=3]
2017-03-20 13:48:37,140 | INFO  | lt-dispatcher-29 | Shard  
  | 207 - 
org.opendaylight.controller.sal-clustering-commons
 - 1.5.0.SNAPSHOT | member-3-shard-topology-operational (Follower): The 
prevLogIndex 3 was found in the log but the term -1 is not equal to the append 
entriesprevLogTerm 1 - lastIndex: 5, snapshotIndex: 4
2017-03-20 13:48:37,140 | INFO  | lt-dispatcher-29 | Shard  
  | 207 - 
org.opendaylight.controller.sal-clustering-commons
 - 1.5.0.SNAPSHOT | member-3-shard-topology-operational (Follower): Follower is 
out-of-sync so sending negative reply: AppendEntriesReply [term=1, 
success=false, followerId=member-3-shard-topology-operational, logLastIndex=5, 
logLastTerm=1, forceInstallSnapshot=false, payloadVersion=5, raftVersion=3]
2017-03-20 13:48:37,661 | INFO  | lt-dispatcher-51 | Shard  
  | 207 - 
org.opendaylight.controller.sa

Re: [controller-dev] EOS entity without an owner after data store exception

2017-03-01 Thread Muthukumaran K
Hi Tom,

From the stacktrace it appears as Beryllium SR3 - 1.3.3.Beryllium-SR3
As I recollect, this appears as symptom which called for pre-leader fix. So, 
would it be better to try out Boron ?

Regards
Muthu


From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Tom Pantelis
Sent: Wednesday, March 01, 2017 9:03 PM
To: Sivasamy Kaliappan 
Cc: controller-dev 
Subject: Re: [controller-dev] EOS entity without an owner after data store 
exception

Please open a bug. If it's reproducible, it would be helpful to enable debug 
for org.opendaylight.controller.cluster.datastore.entityownership on each node 
and provide the log files.

On Wed, Mar 1, 2017 at 4:34 AM, Sivasamy Kaliappan 
mailto:sivasa...@gmail.com>> wrote:
All,

We have a 3 node cluster and an entity A defined. On startup all the nodes in 
cluster creates the entity and register the same. Following is the sequence of 
events:

  1.  During cluster startup an owner for the entity A is elected and all 
listeners are informed
  2.  A leader for entity ownership shard is elected (different from entity 
leader)
  3.  After this election I am seeing below data store exception in the newly 
elected shard leader
  4.  After this exception I am getting an owner changed event where 
hasOwner=false. i.e currently there no owner for this entity and it remains in 
this state forever
  5.  org.opendaylight.controller.cluster.datastore.Shard.finishCommit() method 
has a comment that during edge cases data store will throw IllegalStateException
Is this behavior expected? What should we do when there is no owner for the 
entity in the cluster?

No Owner Event:

2017-01-27 22:13:25,920 | INFO  | lt-dispatcher-16 | EntityOwnerChangeListener  
 | 283 - com.aaa.odl - 0.1.0.SNAPSHOT | 
ownershipChanged:request,handleOwnershipChanged: EntityOwnershipChanged 
[entity=Entity{type='controller', 
id=/(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)entity/entity[{(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)name=controller}]},
 wasOwner=false, isOwner=false, hasOwner=false, inJeopardy=false] event 
received for entity Entity{type='controller', 
id=/(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)entity/entity[{(urn:opendaylight:params:xml:ns:yang:controller:md:sal:core:general-entity?revision=2015-08-20)name=controller}]}


Exception:

2017-01-27 22:13:25,787 | WARN  | lt-dispatcher-17 | EntityOwnershipShard   
  | 140 - org.opendaylight.controller.sal-akka-raft - 1.3.3.Beryllium-SR3 | 
member-2-shard-entity-ownership-operational: commit failed for transaction 
member-2-txn-2-1485584005749 - retrying as foreign candidate
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Store 
tree 
org.opendaylight.yangtools.yang.data.api.schema.tree.spi.MaterializedContainerNode@3316bc82
 and candidate base 
org.opendaylight.yangtools.yang.data.api.schema.tree.spi.MaterializedContainerNode@7a30a559
 differ.
at 
com.google.common.util.concurrent.Futures$ImmediateFailedFuture.get(Futures.java:190)[37:com.google.guava:18.0.0]
at 
org.opendaylight.controller.cluster.datastore.ShardCommitCoordinator$CohortEntry.commit(ShardCommitCoordinator.java:670)[143:org.opendaylight.controller.sal-distributed-datastore:1.3.3.Beryllium-SR3]
at 
org.opendaylight.controller.cluster.datastore.Shard.finishCommit(Shard.java:352)[143:org.opendaylight.controller.sal-distributed-datastore:1.3.3.Beryllium-SR3]
at 
org.opendaylight.controller.cluster.datastore.Shard.finishCommit(Shard.java:420)[143:org.opendaylight.controller.sal-distributed-datastore:1.3.3.Beryllium-SR3]
at 
org.opendaylight.controller.cluster.datastore.Shard.applyState(Shard.java:668)[143:org.opendaylight.controller.sal-distributed-datastore:1.3.3.Beryllium-SR3]
at 
org.opendaylight.controller.cluster.raft.RaftActor.handleCommand(RaftActor.java:225)[140:org.opendaylight.controller.sal-akka-raft:1.3.3.Beryllium-SR3]
at 
org.opendaylight.controller.cluster.common.actor.AbstractUntypedPersistentActor.onReceiveCommand(AbstractUntypedPersistentActor.java:36)[139:org.opendaylight.controller.sal-clustering-commons:1.3.3.Beryllium-SR3]
at 
org.opendaylight.controller.cluster.datastore.Shard.onReceiveCommand(Shard.java:276)[143:org.opendaylight.controller.sal-distributed-datastore:1.3.3.Beryllium-SR3]
at 
org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipShard.onReceiveCommand(EntityOwnershipShard.java:137)[143:org.opendaylight.controller.sal-distributed-datastore:1.3.3.Beryllium-SR3]
at 
akka.persistence.Untyp

Re: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: OutOfMemory from datastore

2017-01-12 Thread Muthukumaran K
I fully agree Sela.

Shard configuration is a flexibility provided that can be used as deployment 
engineering work and a choice to be made as part of a module’s deployment plan. 
So, as it stands today, the earlier the decision of whether we dedicate a shard 
for our specific model is taken, we would be more cautious.

Of course, I am setting aside runtime resource aspect of creating too many 
shards (please note that every shard represents a mini RAFT instance by itself 
across the cluster)

Internal implementation is one and the same.

Regards
Muthu


From: Sela, Guy [mailto:guy.s...@hpe.com]
Sent: Thursday, January 12, 2017 4:21 PM
To: Vishal Thapar ; Muthukumaran K 
; Tom Pantelis 
Cc: odl netvirt dev ; 
controller-dev@lists.opendaylight.org; integration-...@lists.opendaylight.org; 
Kochba, Alon 
Subject: RE: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Hi,

I’ll use the same argument that I gave to Muthu:
“
Shards configuration is just configuration, and can be determined in the 
installation/distribution of the product.
The code is always the same code.
One distribution could decide to split the netvirt modules into 3 shards, and a 
different could decide to leave everything in the default shard.
If I want to be safe while writing the code I have to consider the “worst 
case”, and assume that potentially my module is in a shard of its own.
Unless you want to say that  the shard configuration file should never be 
touched?
“

From: Vishal Thapar [mailto:vishal.tha...@ericsson.com]
Sent: Thursday, January 12, 2017 12:18 PM
To: Muthukumaran K 
mailto:muthukumara...@ericsson.com>>; Sela, Guy 
mailto:guy.s...@hpe.com>>; Tom Pantelis 
mailto:tompante...@gmail.com>>
Cc: odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>;
 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>;
 Kochba, Alon mailto:alo...@hpe.com>>
Subject: RE: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Hi Guy,

You can look at BatchingUtils in Genius that we use for InterfaceManager. Since 
all IFM data goes to TopologyConfig or Default config shard, we batch 
transactions separately. As of today we do know statically which yang model 
goes to which shard and we’re taking advantage of it to do batching in 
Genius/Netvirt.

Regards,
Vishal.

From: 
controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>
 [mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of 
Muthukumaran K
Sent: 12 January 2017 15:00
To: Sela, Guy mailto:guy.s...@hpe.com>>; Tom Pantelis 
mailto:tompante...@gmail.com>>
Cc: odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>;
 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>;
 Kochba, Alon mailto:alo...@hpe.com>>
Subject: Re: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Hi Sela,

Transaction is wrt a shard. So, if we take Netvirt for example, all the config 
yang modules get mapped to default config shard. So, its ok to have a single 
config transaction spanning multiple yang models

But if you are configuring a flow (which maps to Inventory-Config Shard) and 
also some tunnel config (which could be part of default-config shard) in same 
transaction, then we cross the shard boundaries and what Robert says hold true 
for that case.

Regards
Muthu


From: Sela, Guy [mailto:guy.s...@hpe.com]
Sent: Thursday, January 12, 2017 2:53 PM
To: Muthukumaran K 
mailto:muthukumara...@ericsson.com>>; Tom Pantelis 
mailto:tompante...@gmail.com>>
Cc: odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 Kochba, Alon mailto:alo...@hpe.com>>; 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>;
 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
Subject: RE: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Thanks for the explanation, I finally understand.
I will print this reply and paste it next to my computer☺

So getting back to my bug prone question, I remember Robert saying in Seattle 
that you shouldn’t write in the same transaction to different shards. Because 
you can’t know in the code how will the final sharding be configured, would you 
say as a good practice to avoid writing to different yang modules in the same 
transaction?

From: Muthukumaran K [mailto:muthukumara...@ericsson.com]
Sent: Thursday, January 12, 2017 11:08 AM
To: Sela, Guy mailto:guy.s...@hpe.com>>; Tom Pantelis 
mailto:tompante...@gmail.com>>
Cc: odl netv

Re: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: OutOfMemory from datastore

2017-01-12 Thread Muthukumaran K
Hi Sela,

>>> Transaction is wrt a shard

As an afterthought, from databroker perspective, while creating a transaction, 
we do not tie it up with a yang-instance-identifier. Its only the operations 
within the transaction reflects the same. So, contract itself does not impose 
this (I vaguely recollect that Sela had brought this up sometime in September 
2016 and Tom and Robert had a good explanation in [1].

So, its more of usage discipline.

[1] - 
https://lists.opendaylight.org/pipermail/controller-dev/2016-September/012646.html

One more thing, if you want to look at the config and operational shards in 
controller, JConsole or its access via Jolokia is a good way to check the same.
Via Jolokia, following URLs give the same

how do I find the list of Operational Shards
HTTP GET http://:8181/jolokia/read/ org.opendaylight.controller: 
Category=ShardManager,name=shard-manager-operational, 
type=DistributedOperationalDataStore 
<http://%3ccontroller-ip%3e:8181/jolokia/read/%20org.opendaylight.controller:%20Category=ShardManager,name=shard-manager-operational,%20type=DistributedOperationalDataStore%20>

So, how do I find the list of Config Shards
HTTP GET http://:8181/jolokia/read/ org.opendaylight.controller: 
Category=ShardManager,name=shard-manager-config, 
type=DistributedConfigDataStore 
<http://%3cPL-Node-IP-address%3e:8181/jolokia/read/%20org.opendaylight.controller:%20Category=ShardManager,name=shard-manager-config,%20type=DistributedConfigDataStore%20>


Regards
Muthu


From: Vishal Thapar
Sent: Thursday, January 12, 2017 3:48 PM
To: Muthukumaran K ; Sela, Guy ; 
Tom Pantelis 
Cc: odl netvirt dev ; 
controller-dev@lists.opendaylight.org; integration-...@lists.opendaylight.org; 
Kochba, Alon 
Subject: RE: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Hi Guy,

You can look at BatchingUtils in Genius that we use for InterfaceManager. Since 
all IFM data goes to TopologyConfig or Default config shard, we batch 
transactions separately. As of today we do know statically which yang model 
goes to which shard and we’re taking advantage of it to do batching in 
Genius/Netvirt.

Regards,
Vishal.

From: 
controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>
 [mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of 
Muthukumaran K
Sent: 12 January 2017 15:00
To: Sela, Guy mailto:guy.s...@hpe.com>>; Tom Pantelis 
mailto:tompante...@gmail.com>>
Cc: odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>;
 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>;
 Kochba, Alon mailto:alo...@hpe.com>>
Subject: Re: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Hi Sela,

Transaction is wrt a shard. So, if we take Netvirt for example, all the config 
yang modules get mapped to default config shard. So, its ok to have a single 
config transaction spanning multiple yang models

But if you are configuring a flow (which maps to Inventory-Config Shard) and 
also some tunnel config (which could be part of default-config shard) in same 
transaction, then we cross the shard boundaries and what Robert says hold true 
for that case.

Regards
Muthu


From: Sela, Guy [mailto:guy.s...@hpe.com]
Sent: Thursday, January 12, 2017 2:53 PM
To: Muthukumaran K 
mailto:muthukumara...@ericsson.com>>; Tom Pantelis 
mailto:tompante...@gmail.com>>
Cc: odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 Kochba, Alon mailto:alo...@hpe.com>>; 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>;
 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
Subject: RE: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Thanks for the explanation, I finally understand.
I will print this reply and paste it next to my computer☺

So getting back to my bug prone question, I remember Robert saying in Seattle 
that you shouldn’t write in the same transaction to different shards. Because 
you can’t know in the code how will the final sharding be configured, would you 
say as a good practice to avoid writing to different yang modules in the same 
transaction?

From: Muthukumaran K [mailto:muthukumara...@ericsson.com]
Sent: Thursday, January 12, 2017 11:08 AM
To: Sela, Guy mailto:guy.s...@hpe.com>>; Tom Pantelis 
mailto:tompante...@gmail.com>>
Cc: odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 Kochba, Alon mailto:alo...@hpe.com>>; 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>;
 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
Subject: RE: [control

Re: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: OutOfMemory from datastore

2017-01-12 Thread Muthukumaran K
Hi Sela,

Transaction is wrt a shard. So, if we take Netvirt for example, all the config 
yang modules get mapped to default config shard. So, its ok to have a single 
config transaction spanning multiple yang models

But if you are configuring a flow (which maps to Inventory-Config Shard) and 
also some tunnel config (which could be part of default-config shard) in same 
transaction, then we cross the shard boundaries and what Robert says hold true 
for that case.

Regards
Muthu


From: Sela, Guy [mailto:guy.s...@hpe.com]
Sent: Thursday, January 12, 2017 2:53 PM
To: Muthukumaran K ; Tom Pantelis 

Cc: odl netvirt dev ; Kochba, Alon 
; integration-...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org
Subject: RE: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Thanks for the explanation, I finally understand.
I will print this reply and paste it next to my computer☺

So getting back to my bug prone question, I remember Robert saying in Seattle 
that you shouldn’t write in the same transaction to different shards. Because 
you can’t know in the code how will the final sharding be configured, would you 
say as a good practice to avoid writing to different yang modules in the same 
transaction?

From: Muthukumaran K [mailto:muthukumara...@ericsson.com]
Sent: Thursday, January 12, 2017 11:08 AM
To: Sela, Guy mailto:guy.s...@hpe.com>>; Tom Pantelis 
mailto:tompante...@gmail.com>>
Cc: odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 Kochba, Alon mailto:alo...@hpe.com>>; 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>;
 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
Subject: RE: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Hi Sela,

There are 3 related aspects viz., module  - yang tree / yang module, shards – a 
group of yang trees whose data is placed together, sharding strategy – how 
trees and shards are related.

Default strategy in ODL is module-level. By that virtue, we can control to the 
extent of mapping one yang module to one shard.

When Tom says that everything else goes into default shard, it’s the built in 
behavior of the sharding and not user controllable. So, at user-controllable 
level its 1:1 --> module:shard and whatever module not configured by user with 
dedicated shard will go into default shard.

Effectively, by default, we would have inventory X 2 + topology X 2 + default X 
2 + toaster X 2 + EOS Shard (implicit and not configured) X 1 = 9 shards will 
be there in the system. Here 2 implies one for oper and one for config

This section describes the default behavior - 
https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:Architecture:Clustering#Configuration


Regards


From: 
controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>
 [mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Wednesday, January 11, 2017 8:37 PM
To: Tom Pantelis mailto:tompante...@gmail.com>>
Cc: odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 Kochba, Alon mailto:alo...@hpe.com>>; 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>;
 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
Subject: Re: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Oh so that’s really different than option 1 I wrote.
You are saying that I have a capability of creating shards by taking different 
yang trees and combining them into shards?.
My smallest unit of work is a yang tree ?

I still don’t see how it is done.
Let’s say I wanted to take the 2 trees in my example and put them in one shard 
only for them.
How will module-shards.conf look like and how will modules.conf will look like?
If you have an example of that in some WIKI, you can just point me to that.


From: Tom Pantelis [mailto:tompante...@gmail.com]
Sent: Wednesday, January 11, 2017 4:58 PM
To: Sela, Guy mailto:guy.s...@hpe.com>>
Cc: Robert Varga mailto:n...@hq.sk>>; Kochba, Alon 
mailto:alo...@hpe.com>>; Williams, Marcus 
mailto:marcus.willi...@intel.com>>; Daniel Farrell 
mailto:dfarr...@redhat.com>>; odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>;
 
integration-...@lists.opendaylight.org<mailto:integration-...@lists.opendaylight.org>
Subject: Re: [netvirt-dev] [controller-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Shards are (currently) statically configured in module-shards.conf. There's 3 
OOB - "topology", "inventory", and "default". Anything not under topology and 
inventory go in

Re: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: OutOfMemory from datastore

2017-01-12 Thread Muthukumaran K
Hi Sela,

There are 3 related aspects viz., module  - yang tree / yang module, shards – a 
group of yang trees whose data is placed together, sharding strategy – how 
trees and shards are related.

Default strategy in ODL is module-level. By that virtue, we can control to the 
extent of mapping one yang module to one shard.

When Tom says that everything else goes into default shard, it’s the built in 
behavior of the sharding and not user controllable. So, at user-controllable 
level its 1:1 --> module:shard and whatever module not configured by user with 
dedicated shard will go into default shard.

Effectively, by default, we would have inventory X 2 + topology X 2 + default X 
2 + toaster X 2 + EOS Shard (implicit and not configured) X 1 = 9 shards will 
be there in the system. Here 2 implies one for oper and one for config

This section describes the default behavior - 
https://wiki.opendaylight.org/view/OpenDaylight_Controller:MD-SAL:Architecture:Clustering#Configuration


Regards

From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Wednesday, January 11, 2017 8:37 PM
To: Tom Pantelis 
Cc: odl netvirt dev ; Kochba, Alon 
; integration-...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org
Subject: Re: [controller-dev] [netvirt-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Oh so that’s really different than option 1 I wrote.
You are saying that I have a capability of creating shards by taking different 
yang trees and combining them into shards?.
My smallest unit of work is a yang tree ?

I still don’t see how it is done.
Let’s say I wanted to take the 2 trees in my example and put them in one shard 
only for them.
How will module-shards.conf look like and how will modules.conf will look like?
If you have an example of that in some WIKI, you can just point me to that.


From: Tom Pantelis [mailto:tompante...@gmail.com]
Sent: Wednesday, January 11, 2017 4:58 PM
To: Sela, Guy mailto:guy.s...@hpe.com>>
Cc: Robert Varga mailto:n...@hq.sk>>; Kochba, Alon 
mailto:alo...@hpe.com>>; Williams, Marcus 
mailto:marcus.willi...@intel.com>>; Daniel Farrell 
mailto:dfarr...@redhat.com>>; odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 
controller-dev@lists.opendaylight.org;
 
integration-...@lists.opendaylight.org
Subject: Re: [netvirt-dev] [controller-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore

Shards are (currently) statically configured in module-shards.conf. There's 3 
OOB - "topology", "inventory", and "default". Anything not under topology and 
inventory go into the default shard.

On Wed, Jan 11, 2017 at 9:51 AM, Sela, Guy 
mailto:guy.s...@hpe.com>> wrote:
So what you mean is that if I create a yang tree in a yang file, it will 
ultimately translate into maximum two shards?
One for the operational and one for the configuration?

So for example elan.yang:
container elan-interface-forwarding-entries {
config false;

list elan-interface-mac {
key "elan-interface";
description "All the MAC addresses learned on a particular elan 
interface";
max-elements "unbounded";
min-elements "0";
leaf elan-interface {
type leafref {
path "/if:interfaces/if:interface/if:name";
}
}

uses forwarding-entries;
}
}

container elan-tag-name-map {
config false;

list elan-tag-name {
key elan-tag;
leaf elan-tag {
type uint32;
}

leaf name {
type string;
description
"The name of the elan-instance.";
}
}
}

These 2 only live in the operational (Because config false), so it means 2 
Shards ?

-Original Message-
From: Robert Varga [mailto:n...@hq.sk]
Sent: Wednesday, January 11, 2017 4:45 PM
To: Sela, Guy mailto:guy.s...@hpe.com>>; Tom Pantelis 
mailto:tompante...@gmail.com>>; Kochba, Alon 
mailto:alo...@hpe.com>>
Cc: Williams, Marcus 
mailto:marcus.willi...@intel.com>>; Daniel Farrell 
mailto:dfarr...@redhat.com>>; odl netvirt dev 
mailto:netvirt-...@lists.opendaylight.org>>;
 
controller-dev@lists.opendaylight.org;
 
integration-...@lists.opendaylight.org
Subject: Re: [netvirt-dev] [controller-dev] [mdsal-dev] Netvirt Scale tests: 
OutOfMemory from datastore
On 01/11/2017 03:42 PM, Sela, Guy wrote:
> I have some blurriness about what a shard is, that I still didn’t
> figure out.
>
> I have some guesses:
>
> 1)  Every yang tree == one shard.
>
> 2)  Shard can be a collection of a number of yang trees.
>
> 3)  None of the above?
>

Mostly 1. Each shar

Re: [controller-dev] Backup and Restore

2017-01-11 Thread Muthukumaran K
Hi Miguel,

There are actually 3 options possible as at now for taking a backup and restore 
the same in ODL


a)  Using Cluster-Admin service, you can take a backup of Config Datastore 
and restore it in another system

b) There is a separate project called DAEXIM (data export and import) to 
take a snapshot of current state of Config as well as Oper datastores - think 
of it more as tool which can take a state snapshot in Json and play it back in 
plain system to bring the clean-slate system to the state in a running system 
for troubleshooting purposes

c)  A rather simpler way - but always works, is to take a backup of 
snapshot and journals of current datastore and use the same in new 
installation. This is for whichever datastore is enabled with persistence

Of course, all above approaches assume that yang models of backed-up data have 
not changed between source and target (where you restore) versions of the 
system.

Regards
Muthu




From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Miguel 
Angel Muñoz Gonzalez
Sent: Thursday, January 12, 2017 12:10 AM
To: controller-dev@lists.opendaylight.org
Cc: disc...@lists.opendaylight.org
Subject: [controller-dev] Backup and Restore

Hi everyone,
We are trying to implement a backup/restore mechanism for ODL. I have not found 
much information about it in pipermail history except some questions and brief 
proposals (such as generating an snapshot and storing in a particular file, 
using an external database,...). I would appreciate if someone familiar with 
the topic could give us some hints on it:


-Is it possible to backup MDSAL Configuration datastore as of today? (I 
suppose it's a matter of backing up leveldb files)


-If so, how can we guarantee that the backup is correct and/or 
consistent while it is running traffic and datastore is being modified?


-Supposedly there is a working mechanism consisting on: stopping ODL, 
copying the files manually and start it up... However, it would nice to know if 
there a more sophisticated mechanism to backup the datastore, especially if it 
does not imply stopping ODL. E.g. a particular API or tool that can be called 
to perform this activity?

Thank you very much,
Best Regards,
Miguel Ángel.
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Interested in Contribution : "Replacing the MD-SAL data store with etcd"

2017-01-01 Thread Muthukumaran K
Thanks for the explanation Robert. 

When we refer access pattern and shard layout, I assume that it is in context 
of a given transaction owned by an application - please correct me if I am 
wrong. 

As a detour, the reference to transactions leads to a bunch of questions on how 
MD-SAL transactions - as we know in context of IMDS, could map on to external 
backend for a given shard (which CDT enables) 

For few transaction capabilities which are inherent in IMDS, there may not be 
equivalent concept/support in an external backend of choice. For example, 
concept like Transaction Chain. 

Other implementation-specific aspects of transaction could also vary 
drastically - for example, 
a) Snapshotting+MVCC as "I" of ACID is applicable for IMDS - but may not be 
true in case of Etcd / Cassandra - this mismatch may not have deeper 
ramifications to the end-user 
b) strong-consistency for writes may be a do-it-yourself if backend choice is 
Cassandra whereas its ingrained in IMDS

Does this essentially mean that based on choice of backend, some concepts of 
transactions could be just no-op/ will be thrown with an unsupported exception 
(eg. transaction-chain) to manage the uniformity of broker contracts?

Regards
Muthu


 

-Original Message-
From: Robert Varga [mailto:n...@hq.sk] 
Sent: Tuesday, December 27, 2016 9:10 PM
To: Muthukumaran K ; Colin Dixon 
; Shrenik Jain 
Cc: integration-...@lists.opendaylight.org ; 
controller-dev ; 
mdsal-...@lists.opendaylight.org; intern-ad...@opendaylight.org
Subject: Re: [controller-dev] Interested in Contribution : "Replacing the 
MD-SAL data store with etcd"

On 12/27/2016 11:38 AM, Muthukumaran K wrote:
> Hi Robert,
> 
> Looking at 
> https://github.com/opendaylight/mdsal/blob/master/dom/mdsal-dom-api/sr
> c/main/java/org/opendaylight/mdsal/dom/api/DOMDataTreeShardingService.
> java
> 
> Few clarifications:
> For same prefix duplicate producers are not allowed - this is 
> understandable. But there is a possibility that two distinct 
> producers/shards can have overlapping prefixes - if this is allowed, 
> would not there be a scenario wherein we could end up with a superset 
> shard and subset shard
> 
> Eg. Let's assume there is a subtree whose prefix is /a/b/c/d mapped to shard 
> 1 and another /a/b/c/d/e/f mapped to another shard. 
> In other words, we can have recursive shards (hypothetical worst case could 
> be that these recursive shards could have their own backend instances).
> Would this not lead to relatively complex read and write routing 
> implementations? Are such overlaps prevented by Sharding service contracts ?

This is allowed and the two shards have a parent/child relationship, where the 
parent understands that it has a subordinate shard. There can (in theory) be 
infinite nesting, although I am not sure if the implementation handles more 
than one level.

As for complexity of data access... it is *relatively* complex, but all it 
really boils down to is a recursive scatter/gather algorithm, where the 
superior shard delegates parts of the request to its subordinates and merges 
the responses -- which is pretty much bread-and-butter when dealing with 
tree-based structures.

In any case, while possible, we envision that most applications will only talk 
to leaf shards, hence scatter/gather will not kick in. The cost of determining 
which shard is the apex for a producer is paid when the producer is 
instantiated, hence the steady-state cost should typically be zero.

If the application access pattern and sharding layout do not align, you will 
get sucky performance, similar to what you get if you try to write to multiple 
module-based shards today. But that is a deployment-time engineering and 
application co-existence issue, which we cannot solve at the infrastructure 
layer.

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Interested in Contribution : "Replacing the MD-SAL data store with etcd"

2016-12-27 Thread Muthukumaran K
Hi Robert, 

Looking at 
https://github.com/opendaylight/mdsal/blob/master/dom/mdsal-dom-api/src/main/java/org/opendaylight/mdsal/dom/api/DOMDataTreeShardingService.java

Few clarifications:
For same prefix duplicate producers are not allowed - this is understandable. 
But there is a possibility that two distinct producers/shards can have 
overlapping prefixes - if this is allowed, would not there be a scenario 
wherein we could end up with a superset shard and subset shard 

Eg. Let's assume there is a subtree whose prefix is /a/b/c/d mapped to shard 1 
and another /a/b/c/d/e/f mapped to another shard. 
In other words, we can have recursive shards (hypothetical worst case could be 
that these recursive shards could have their own backend instances).
Would this not lead to relatively complex read and write routing 
implementations? Are such overlaps prevented by Sharding service contracts ?

Regards
Muthu


-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Robert Varga
Sent: Sunday, December 25, 2016 11:35 PM
To: Colin Dixon ; Shrenik Jain 

Cc: integration-...@lists.opendaylight.org ; 
controller-dev ; 
mdsal-...@lists.opendaylight.org; intern-ad...@opendaylight.org
Subject: Re: [controller-dev] Interested in Contribution : "Replacing the 
MD-SAL data store with etcd"

On 12/23/2016 05:39 PM, Colin Dixon wrote:
> It looks like this might be the most real it got:
> https://git.opendaylight.org/gerrit/#/c/14998/
> 

It has gotten way more real than that, benchmarks for IMDS were presented at 
the last ODL Summit.

Base entrypoint is implemented here:

https://github.com/opendaylight/mdsal/blob/master/dom/mdsal-dom-broker/src/main/java/org/opendaylight/mdsal/dom/broker/ShardedDOMDataTree.java

Helper classes for shard implementations live here:
https://github.com/opendaylight/mdsal/tree/master/dom/mdsal-dom-spi/src/main/java/org/opendaylight/mdsal/dom/spi/shard

IMDS implementation lives here:
https://github.com/opendaylight/mdsal/blob/master/dom/mdsal-dom-inmemory-datastore/src/main/java/org/opendaylight/mdsal/dom/store/inmemory/InMemoryDOMDataTreeShard.java

CDS implementation is under review, patch series starts here:
https://git.opendaylight.org/gerrit/44943

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Upgrade design discussions - continuity of last design summit session

2016-12-10 Thread Muthukumaran K
Thanks Robert. 

>>> Oracle's 'STARTUP UPGRADE' does a similar thing

This analogy improves my understanding :-)

Regards
Muthu


-Original Message-
From: Robert Varga [mailto:n...@hq.sk] 
Sent: Saturday, December 10, 2016 4:28 PM
To: Muthukumaran K ; Colin Dixon 
; Tom Pantelis 
Cc: Shuva Jyoti Kar ; Ashvin Lakshmikantha 
; controller-dev@lists.opendaylight.org
Subject: Re: Upgrade design discussions - continuity of last design summit 
session

On 12/10/2016 09:51 AM, Muthukumaran K wrote:
> Hi Robert,
> 
> Is the approach more on lines of following (trying to wrap around my 
> head where schemacontext persistence fits in)
> 
> a) extract data from old version when cluster is running (with some 
> network-fencing to prevent external interfaces - openflow, bgp, etc., 
> from mutating the config state when extraction is in progress - 
> indirectly a checkpoint). Output could be primarily XML - since this 
> is more amenable for subsequent transformations using tools like XSLT, 
> XPath etc
> 
> b) transform the data using offline artifacts (scripts / plain java 
> code artifacts) to match with target model
> 
> c) bring up the cluster with new version of software
> 
> d) load the transformed XML data via RESTCONF (at this point, system 
> is not yet open to outside world but some "privileged" access is given 
> for "upgrade tool" to load data via RESTCONF)
> 
> e) perform full cluster reboot before throwing floodgates open for 
> external interfaces
> 
> And this cycle happens for every model targeted for transformation. 
> 
> Even if schemacontext is persisted, it is going to represent older 
> version or newer version. But, in step (d) above system basically 
> requires schemacontext of newer version to apply transformed data to 
> CDS and not the one of old schemacontext. Is this what you meant in 
> second comment of the bug (unless I have  misread)
> 
> Am I missing something basic ?

The sequence you outlined could work, except it we cannot use RESTCONF or 
really any other application, as that requires the datastore to be fully 
operational, or expose some sort of lifecycle hooks...

I have not delved into the design of that part, the primary objective is to 
have the Shard recovery not use PruningDataTreeModification, but the old 
SchemaContext, thus preserving all previous data during recovery.

Once we replay the journal, we will end up being in the situation, where we 
have all of the old data, the old SchemaContext to go with it and the new 
SchemaContext.

At that point, the datastore can call out to an upgrade component, asking the 
question: what is the upgrade *transaction* needed to switch the SchemaContexts 
without losing data?

After the upgrade component responds, the data store will apply that 
transaction to the DataTree, persist an 'upgrade' journal entry, and switch to 
being fully operational.

To liken this to an existing database system -- Oracle's 'STARTUP UPGRADE' does 
a similar thing -- it brings the database up to the point where you can run DB 
upgrade scripts.

Bye,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] Clarification on RESTCONF password change methodology in cluster

2016-11-30 Thread Muthukumaran K
Hi,

If we follow the procedure explained under section "Through REST Endpoints" at 
the link - https://wiki.opendaylight.org/view/AAA:Changing_Account_Passwords in 
a 3 node cluster, should the step be executed on all 3 cluster nodes. Asking 
because the file - idmlight.db.mv.db, is local to each node.

Please clarify if I am missing something

Regards
Muthu




___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Global RPCs aren't delegated in a cluster?

2016-11-29 Thread Muthukumaran K
Hi Robert, 

I searched around if there are any specific definition of 'nearest' in RFC - 
but could not find one. So, in case of symmetric deployment (or as mentioned by 
Guy in case of multiple RoutedRPC being registered with same route-key across 
the cluster), nearest could only imply network-local  - in which case, current 
implementation of local routing satisfies the requirement - right ?



Regards
Muthu

-Original Message-
From: Robert Varga [mailto:n...@hq.sk] 
Sent: Tuesday, November 29, 2016 6:20 PM
To: Sela, Guy ; Muthukumaran K ; 
controller-dev@lists.opendaylight.org; mdsal-...@lists.opendaylight.org
Subject: Re: [controller-dev] Global RPCs aren't delegated in a cluster?

On 11/29/2016 01:47 PM, Sela, Guy wrote:
> Great, but as I understand from these bugs:
> https://bugs.opendaylight.org/show_bug.cgi?id=6310
> https://bugs.opendaylight.org/show_bug.cgi?id=3128
> 
> Global RPC today has a bug that it is not routed to the nearest registered 
> implementation inside a cluster, and therefore I need to use a RoutedRPC as a 
> workaround.
> Right?

Correct, and we need to fix the clustered implementation.

Regards,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Controller to controller communication!

2016-11-25 Thread Muthukumaran K
Hi Pragati,

Are you looking for clustering of ODL ? If so, you may want to have a look at a 
good introduction here - https://www.youtube.com/watch?v=A9wAAbvliR0
Also, there are presentations on ODL wiki on clustering.

ODL clustering internally uses inter-node (controller instances) communication 
using messaging and provide higher level abstractions like 
clusterwide-data-change-notifications (or in other words, you can setup 
listeners clusterwide and receive notifications whenever MD-SAL datastore data 
is changed).

I am not sure if that is something which suits your usecase. But, if you could 
explain your specific usecase, it would be easier to see if any existing 
mechanisms can be used

Regards
Muthu



From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Pragati 
Shrivastava
Sent: Friday, November 25, 2016 3:29 PM
To: controller-dev 
Subject: [controller-dev] Controller to controller communication!

Hi all,
I want to exchange messages between two open-daylight controllers.
How is it possible? Is there any API which is helpful to establish 
communication between two different open-daylight controllers.
Please guide me through this.
Thank you.
--
Pragati Shrivastava
Phd Scholar,
Dept. of Computer Science & Engineering,
Indian Institute of Technology Hyderabad,
Email: cs14resch11...@iith.ac.in
pragatipr...@gmail.com
Cell: 09966154652
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Global RPCs aren't delegated in a cluster?

2016-11-23 Thread Muthukumaran K
Hi Guy,

To your previous mail, yes, the routing is currently at the local-node level.

Wrt to your second mail on routed-rpc, that's roughly how it works in context 
of OFPlugin. As to which node acts as current active RPC provider for a given 
route-key (switch-id in this case) is determined using the cluster-singleton 
mechanism.

Coming back to Global RPC and its routing across cluster, I had a clarification 
which I have posted as part of bug 
https://bugs.opendaylight.org/show_bug.cgi?id=3128 (pasting below for ready 
reference)

In symmetric cluster, there can be multiple routees as providers of same RPC 
service. In which case, how single target local / remote-provider would be 
chosen ?

In case of routed-rpc , route-key is unique across cluster so rpc target 
address would resolve to one and only one of N cluster nodes. But, in case of 
global rpcs, there can be multiple target address for same RPC and resolving 
could imply some kind of routing-policy to resolve to single target.





Regards
Muthu


From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Wednesday, November 23, 2016 9:58 PM
To: controller-dev@lists.opendaylight.org; mdsal-...@lists.opendaylight.org
Subject: Re: [controller-dev] Global RPCs aren't delegated in a cluster?

Also, can I just use a workaround of a RoutedRPC with only one context that 
will be registered by the leader?


From: Sela, Guy
Sent: Wednesday, November 23, 2016 6:10 PM
To: 
controller-dev@lists.opendaylight.org;
 'mdsal-...@lists.opendaylight.org' 
mailto:mdsal-...@lists.opendaylight.org>>
Subject: Global RPCs aren't delegated in a cluster?

Hi,

I saw these bugs:
https://bugs.opendaylight.org/show_bug.cgi?id=6310
https://bugs.opendaylight.org/show_bug.cgi?id=3128

>From my understanding, these bugs mean that if I register a global RPC in a 
>leader instance in a cluster, when invoking the RPC in a non-leader instance, 
>the invocation won't be delegated to the leader.
Is this correct?
If so, is there any estimation when will this be fixed?

Thanks,
Guy Sela

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Some DDF notes from 9/29/2016

2016-09-30 Thread Muthukumaran K
Notes on messaging :

Another important capability of current MDSAL DTCN / DCN is that it’s also 
retrospective in nature (ie. even when registration happens later after the 
changes have been committed, late-comers still get the change-notifications) – 
more like persisted messaging in conventional brokered messaging system. This 
plays a major role in applications not having to bother much about the 
“recovery” to rebuild their ephemeral states based on such notifications even 
if they come up later in their order of booting.

I have been examining few OS frameworks off-late which provide similar 
capabilities. Will post my comparisons on the list sooner. Something which 
comes relatively closer in terms of this capability is NATS Streaming 
(http://nats.io/documentation/streaming/nats-streaming-intro/). Client contract 
resembles  that of Kafka (clients to remember offset of last consumption or 
they can consume from first (if persistence can tolerate that volume) etc)
Apart from location-optimized delivery of something similar to DubSub of Akka, 
NATS claims support of Req/Reply, Pub/Sub and work-distribution kind of 
patterns.

Of course, NATS is a brokered cluster by itself ☺

Regards
Muthu





From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Colin Dixon
Sent: Friday, September 30, 2016 11:09 PM
To: ODL Dev; controller-dev
Subject: [controller-dev] Some DDF notes from 9/29/2016

See attached files for raw notes.
--Colin
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [documentation] Questions about ODL clustering

2016-09-18 Thread Muthukumaran K
Hi Guy, 

There is a jmx operation (can be invoked via JConsole) to dump the snapshot on 
demand on per shard basis for config DS shards. 

Transactions do take snapshots but they are inmemory. As you surmise, 
transaction-level snapshotting is "Isolation" part of ACID.  
A good explanation was given by Robert in following mail threads  

https://lists.opendaylight.org/pipermail/controller-dev/2016-July/012298.html 
https://lists.opendaylight.org/pipermail/controller-dev/2016-July/012331.html

Regards
Muthu


-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Sunday, September 18, 2016 5:59 PM
To: Robert Varga; Tom Pantelis
Cc: controller-dev
Subject: Re: [controller-dev] [documentation] Questions about ODL clustering

What I mean is to be able to create a snapshot of the entire state and not only 
a specific data tree.


-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Sunday, September 18, 2016 3:25 PM
To: Robert Varga ; Tom Pantelis 
Cc: controller-dev 
Subject: Re: [controller-dev] [documentation] Questions about ODL clustering

Thanks.
Are there any plans to do something regarding snapshots that resembles Snapshot 
Isolation in a DB?

For example:
https://msdn.microsoft.com/en-us/library/tcbchxcb(v=vs.110).aspx

-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Robert Varga
Sent: Sunday, September 18, 2016 1:28 PM
To: Sela, Guy ; Tom Pantelis 
Cc: controller-dev 
Subject: Re: [controller-dev] [documentation] Questions about ODL clustering

On 09/15/2016 05:17 PM, Sela, Guy wrote:
> ReadOnlyTransaction tx = db.newReadOnlyTransaction();
> 
> CheckedFuture, ReadFailedException> read1 = 
> tx.read(X, Y)
> 
> CheckedFuture, ReadFailedException> read2 = 
> tx.read(Z, H)
> 
>  
> 
> read1 and read2 were read from different snapshots?
> 
> Does the answer change if they were invoked on the same/different data 
> trees?
> 

If both reads target the same shard, they will be executed from the same 
snapshot.

Bye,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [documentation] Questions about ODL clustering

2016-09-14 Thread Muthukumaran K
Hi Srini,

We have tried the approach what Moiz had mentioned – using CDTCN and caching 
data, and it was quite performant in one of our reference application. You may 
want to look - https://git.opendaylight.org/gerrit/#/c/45131/

Regards
Muthu



From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Moiz Raja 
(moraja)
Sent: Thursday, September 15, 2016 3:06 AM
To: Tom Pantelis; Srini Seetharaman
Cc: controller-dev
Subject: Re: [controller-dev] [documentation] Questions about ODL clustering

The single use ClusteredDataChange/ClusteredDataTreeChange listeners are fine 
and may perform better than the remote read but if you really have a lot of 
reads even this mechanism is expensive as there is quite a bit of overhead 
associated with setting up a listener.

I would recommend that you setup a ClusteredDataTreeChangeListener (for long 
term use) for the data that you want to constantly read and cache the data in 
that listener. Then provide a way to read from that cache.

-Moiz

From: Tom Pantelis mailto:tompante...@gmail.com>>
Date: Wednesday, September 14, 2016 at 2:26 PM
To: Srini Seetharaman 
mailto:srini.seethara...@gmail.com>>
Cc: Moiz Raja mailto:mor...@cisco.com>>, controller-dev 
mailto:controller-dev@lists.opendaylight.org>>
Subject: Re: [controller-dev] [documentation] Questions about ODL clustering

All reads still go to the leader.  There has been an enhancement 
https://bugs.opendaylight.org/show_bug.cgi?id=2504 open for this but hasn't 
been implemented. There is an alternative way using a DataTreeChangeListener as 
Moiz mentioned in the bug.

On Wed, Sep 14, 2016 at 4:57 PM, Srini Seetharaman 
mailto:srini.seethara...@gmail.com>> wrote:
With Beryllium-SR3, I just verified using tcpdump on port 2550 that the data 
for the read operation at the follower came over the network from the shard 
leader.

Is there any plan with Boron to make it a local read from the replica?

On Wed, Sep 14, 2016 at 1:43 PM, Srini Seetharaman 
mailto:srini.seethara...@gmail.com>> wrote:
Hi Tom and Moiz
Is it still the case with Beryllium and Boron that the read transactions from a 
follower are forwarded to the leader?

Thanks
Srini.

On Sat, Feb 28, 2015 at 8:26 AM, Tom Pantelis 
mailto:tompante...@gmail.com>> wrote:
Colin, Tianzhu

Reads are also forwarded to the leader so, yes, remote reads would take longer. 
With IMDS, reads are actually synchronous so the returned Future is immediate 
but with CDS, the read is async, whether it's local or not. So it's best to not 
block on the Future as there will be some latency with CDS, but rather use a 
Future callback if possible.

Tom

On Sat, Feb 28, 2015 at 10:42 AM, Colin Dixon 
mailto:co...@colindixon.com>> wrote:
I'm cc'ing controller-dev since they will have the authoritative answer.
I *think* the answer is that all data is replicated to all nodes in the cluster 
and so all reads can be local. Only writes have to go to the shard leader, but 
I could be wrong.
Moiz and Tom Pantelis would know more.

--Colin

On Sat, Feb 28, 2015 at 4:55 AM, 我心永恒 
mailto:zhuzhuaiqiqi1...@gmail.com>> wrote:
Dear all, I am studying ODL and have one question:

When a consumer launches a read transaction, if I'm not wrong, it has to be 
forwarded to the primary shard controller. So if this case is possible that the 
transaction is remote and the consumer has to wait longer since the transaction 
is not local?

  Thanks & regards
  Tianzhu


___
documentation mailing list
documentat...@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/documentation


___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev



___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] External Datastore for ODL - an assessment

2016-09-06 Thread Muthukumaran K
Hi,

Context : There was a mail earlier enlisting the scale-out requirements - 
https://lists.opendaylight.org/pipermail/mdsal-dev/2016-April/000263.html
The main focus is throughput and ability to scale-out to multiple nodes with 
minimal degradation,

We have assessed this from perspective of current MD-SAL Datastore approach and 
have put up the aspects of what it would mean to integrate an external 
datastore.

https://www.dropbox.com/s/3bslos5qk75reas/OF-17-ODL-External-datastore-assessments-study-and-proposal-02-Sep-2016-final.docx?dl=0

We can discuss this over calls and perhaps DDF could be a good forum 
particularly when scaling / external datastore integration topics would be 
discussed

Regards
Muthu

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [mdsal-dev] Serialize/Deserialize DTOs to JSON

2016-08-31 Thread Muthukumaran K
Hi Robert, 

In this case, the CONSUMER *is cognizant* of what Clazz (NetworkTopology.class 
in below case) to use to get the NN(NormalizedNode) converted into 
corresponding BA. 

DataObject obj = 
BindingNormalizedNodeCodecRegistry.fromNormalizedNode(YangInstanceIdentifier.of(BindingReflections.findQName(NetworkTopology.class)),
 normalizedNode);

So if there is going to be multiple JSONs representing different NNs (and hence 
different BA DTOs), it would not be possible to use this approach ?

Regards
Muthu


-Original Message-
From: Robert Varga [mailto:n...@hq.sk] 
Sent: Thursday, September 01, 2016 4:27 AM
To: Sela, Guy; Muthukumaran K; controller-dev@lists.opendaylight.org; 
mdsal-...@lists.opendaylight.org; yangtools-...@lists.opendaylight.org
Subject: Re: [mdsal-dev] Serialize/Deserialize DTOs to JSON

On 08/24/2016 01:21 PM, Sela, Guy wrote:
> The only way it is working for me now, is the following (Some of it is 
> pseudo-code):
> "
> PRODUCER CODE:
> I have a 
> org.opendaylight.yang.gen.v1.urn.tbd.params.xml.ns.yang.network.topology.rev131021.network.topology.topology.Node
>  in my hands (Got it from a DCN).
> This is the code I need to run in order to do what I'm trying:
> TopologyBuilder topologyBuilder = new TopologyBuilder(); 
> topologyBuilder.setKey(new TopologyKey(new TopologyId(new 
> Uri("ovsdb:1"; 
> topologyBuilder.setNode(Collections.singletonList(node));
> NetworkTopologyBuilder ntBuilder = new NetworkTopologyBuilder(); 
> ntBuilder.setTopology(Collections.singletonList(topologyBuilder.build(
> ))); InstanceIdentifier path = 
> InstanceIdentifier.create(NetworkTopology.class);
> String json = TTPUtils.jsonStringFromDataObject(path, 
> ntBuilder.build(), true); Serialize the json...
> 
> CONSUMER CODE:
> NormalizedNode normalizedNode = 
> TTPUtils.normalizedNodeFromJsonString(jsonInput);
> DataObject obj = BindingNormalizedNodeCodecRegistry. 
> fromNormalizedNode(YangInstanceIdentifier.of(BindingReflections.findQN
> ame(NetworkTopology.class)), normalizedNode); NetworkTopology nt = 
> (NetworkTopology)obj; Node node = 
> nt.getTopology().get(0).getNode().get(0);
> return node;
> "
> 
> Is this what you guys meant by working with NormalizedNodes or is there a 
> better way?

Yes, pretty much. Since you mentioned getting the data from DCN -- you could 
use DOMDataBroker's DOMDataTreeChangeService extension to get the data in 
NormalizedNode format, skipping one conversion (and talking to JSON codec 
directly).

Bye,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Invoking MD-SAL RPC's from outside the cluster

2016-08-25 Thread Muthukumaran K
Hi Guy ,

>>> Is it enough that the external ODL and the 3 ODLs will be connected with an 
>>> AKKA cluster, or does it have to be part of the ODL cluster, or maybe it's 
>>> just the same?
For routed-rpc, it has to be part of ODL cluster because the routee address map 
is synchronized via ODL's own gossip mechanism using Akka cluster as 'transport'

>>> in an async way.
Ok. Got it. Async is the catch in your case. So, alternative like RESTCONF 
based RPC (which I had suggested in my previous mail) may also be of no 
significant use unless RPC output semantics is something on lines of 'request 
accepted' if at least that level of synchronous call is tolerable.

Regards
Muthu



From: Sela, Guy [mailto:guy.s...@hpe.com]
Sent: Thursday, August 25, 2016 1:25 PM
To: Muthukumaran K; mdsal-...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org
Cc: Alfasi, Shlomi; Cohen, Elad
Subject: RE: Invoking MD-SAL RPC's from outside the cluster

Hi Muthu,

I'm familiar with the Routed-RPC model.
The usecase I'm talking about is just a normal RPC, not a routed one.
I want the external ODL to invoke an RPC on each of the 3 ODLs in the cluster.
I want to focus on what you said in the end.
Is it enough that the external ODL and the 3 ODLs will be connected with an 
AKKA cluster, or does it have to be part of the ODL cluster, or maybe it's just 
the same?
In our deployment, we are talking about a multi-site deployment.
It means we have N ODL-clusters, and a single ODL for monitoring and 
orchestration of the clusters.
I want this single/external ODL to be able to invoke remote methods on each ODL 
instance in each cluster, in an async way.
I don't think I can really use the standard ODL tools, because it means I need 
to hold a reference to an RPCRegistry, and from my understanding, the scope of 
the RPCRegistry is the cluster-scope.

From: Muthukumaran K [mailto:muthukumara...@ericsson.com]
Sent: Thursday, August 25, 2016 10:41 AM
To: Sela, Guy mailto:guy.s...@hpe.com>>; 
mdsal-...@lists.opendaylight.org<mailto:mdsal-...@lists.opendaylight.org>; 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
Cc: Alfasi, Shlomi mailto:shlomi.alf...@hpe.com>>; 
Cohen, Elad mailto:elad.coh...@hpe.com>>
Subject: RE: Invoking MD-SAL RPC's from outside the cluster

Hi Sela,

You might want to look into the Routed-RPC option for this if your performance 
requirements are not so severe.

To illustrate, in Openflowplugin, every switch in effect exposes a RPC and 
registers itself with the central routed-rpc registry. If a flow has to be 
provisioning from Node A but the switch is connected to Node B (or Node B is 
master of switch in OF high-availability lingo),  RPC call from Node A gets 
routed by MD-SAL infrastructure to Node B's RPC-provider of given switch.

I am not sure about your deployment model. But an important pre-requisite for 
above to work is that both RPC consumer(S) and RPC provider(s) must be on same 
Akka cluster

Hope this helps

Regards
Muthu


From: 
controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>
 [mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Thursday, August 25, 2016 1:01 PM
To: mdsal-...@lists.opendaylight.org<mailto:mdsal-...@lists.opendaylight.org>; 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
Cc: Alfasi, Shlomi; Cohen, Elad
Subject: [controller-dev] Invoking MD-SAL RPC's from outside the cluster

Hi,
Let's say I have a cluster of 3 ODLs, and I have another external ODL that is 
not part of the Cluster.
Can this external ODL invoke RPCs on the ODLs in the cluster?
The current MD-SAL RPC framework allows me to invoke RPCs on remote machines in 
an async way, giving me a Future.
I want the same capability for a machine that doesn't live in the cluster.
If the answer is no, what do you think will be the best way to do it?

What we are trying to achieve is very similar to this library: 
https://github.com/barakb/asyncrmi/

Thanks,
Guy Sela

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Invoking MD-SAL RPC's from outside the cluster

2016-08-25 Thread Muthukumaran K
Sorry I missed out the part
I have another external ODL that is *not part of the Cluster*

In such a case, invoking RPC via RESTCONF with a HA proxy (just to ensure the 
RESTCONF call land up in only one of 3 nodes) would be an option. RPC consumer 
must be able to create required input JSON and be able to interpret RPC output 
JSON as per the RPC yang model of provider

Better alternative if HA Proxy is not preferred, there is a latest addition to 
ODL named Cluster Wide Singleton service - design document - 
https://bugs.opendaylight.org/attachment.cgi?id=1059

You  may want to take a look at this enhancement - 
https://bugs.opendaylight.org/show_bug.cgi?id=5421

Regards
Muthu



From: Muthukumaran K
Sent: Thursday, August 25, 2016 1:11 PM
To: 'Sela, Guy'; mdsal-...@lists.opendaylight.org; 
controller-dev@lists.opendaylight.org
Cc: Alfasi, Shlomi; Cohen, Elad
Subject: RE: Invoking MD-SAL RPC's from outside the cluster

Hi Sela,

You might want to look into the Routed-RPC option for this if your performance 
requirements are not so severe.

To illustrate, in Openflowplugin, every switch in effect exposes a RPC and 
registers itself with the central routed-rpc registry. If a flow has to be 
provisioning from Node A but the switch is connected to Node B (or Node B is 
master of switch in OF high-availability lingo),  RPC call from Node A gets 
routed by MD-SAL infrastructure to Node B's RPC-provider of given switch.

I am not sure about your deployment model. But an important pre-requisite for 
above to work is that both RPC consumer(S) and RPC provider(s) must be on same 
Akka cluster

Hope this helps

Regards
Muthu


From: 
controller-dev-boun...@lists.opendaylight.org<mailto:controller-dev-boun...@lists.opendaylight.org>
 [mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Thursday, August 25, 2016 1:01 PM
To: mdsal-...@lists.opendaylight.org<mailto:mdsal-...@lists.opendaylight.org>; 
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
Cc: Alfasi, Shlomi; Cohen, Elad
Subject: [controller-dev] Invoking MD-SAL RPC's from outside the cluster

Hi,
Let's say I have a cluster of 3 ODLs, and I have another external ODL that is 
not part of the Cluster.
Can this external ODL invoke RPCs on the ODLs in the cluster?
The current MD-SAL RPC framework allows me to invoke RPCs on remote machines in 
an async way, giving me a Future.
I want the same capability for a machine that doesn't live in the cluster.
If the answer is no, what do you think will be the best way to do it?

What we are trying to achieve is very similar to this library: 
https://github.com/barakb/asyncrmi/

Thanks,
Guy Sela

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] Invoking MD-SAL RPC's from outside the cluster

2016-08-25 Thread Muthukumaran K
Hi Sela,

You might want to look into the Routed-RPC option for this if your performance 
requirements are not so severe.

To illustrate, in Openflowplugin, every switch in effect exposes a RPC and 
registers itself with the central routed-rpc registry. If a flow has to be 
provisioning from Node A but the switch is connected to Node B (or Node B is 
master of switch in OF high-availability lingo),  RPC call from Node A gets 
routed by MD-SAL infrastructure to Node B's RPC-provider of given switch.

I am not sure about your deployment model. But an important pre-requisite for 
above to work is that both RPC consumer(S) and RPC provider(s) must be on same 
Akka cluster

Hope this helps

Regards
Muthu


From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Thursday, August 25, 2016 1:01 PM
To: mdsal-...@lists.opendaylight.org; controller-dev@lists.opendaylight.org
Cc: Alfasi, Shlomi; Cohen, Elad
Subject: [controller-dev] Invoking MD-SAL RPC's from outside the cluster

Hi,
Let's say I have a cluster of 3 ODLs, and I have another external ODL that is 
not part of the Cluster.
Can this external ODL invoke RPCs on the ODLs in the cluster?
The current MD-SAL RPC framework allows me to invoke RPCs on remote machines in 
an async way, giving me a Future.
I want the same capability for a machine that doesn't live in the cluster.
If the answer is no, what do you think will be the best way to do it?

What we are trying to achieve is very similar to this library: 
https://github.com/barakb/asyncrmi/

Thanks,
Guy Sela

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [mdsal-dev] Serialize/Deserialize DTOs to JSON

2016-08-23 Thread Muthukumaran K
Since NormalizedNode is a pre-requisite for the deserialization into json, this 
seems to demonstrate how to get an entry from keyed list  
https://github.com/opendaylight/yangtools/blob/master/yang/yang-data-codec-gson/src/test/java/org/opendaylight/yangtools/yang/data/codec/gson/NormalizedNodeToJsonStreamTest.java
 particularly  'keyedListNodeInContainer()'

-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Tuesday, August 23, 2016 7:52 PM
To: Robert Varga; controller-dev@lists.opendaylight.org; 
mdsal-...@lists.opendaylight.org
Subject: Re: [controller-dev] [mdsal-dev] Serialize/Deserialize DTOs to JSON

Sorry, I meant to ask about an entry of a list, not a leaf node.
I mean, let's say I have this yang model:
"
container elan-instances {
description
   "elan instances configuration parameters. Elan instances support 
both the VLAN and VNI based elans.";

list elan-instance {
max-elements "unbounded";
min-elements "0";
key "elan-instance-name";
description
"Specifies the name of the elan instance. It is a string of 1 
to 31
 case-sensitive characters.";
leaf elan-instance-name {
...
"
Can I convert a specific elan-instance ? Or do I have to convert something 
bigger?

-Original Message-
From: Sela, Guy
Sent: Tuesday, August 23, 2016 4:49 PM
To: 'Robert Varga' ; controller-dev@lists.opendaylight.org; 
mdsal-...@lists.opendaylight.org
Subject: RE: [mdsal-dev] Serialize/Deserialize DTOs to JSON

>From what I'm seeing so far, I can't convert a leaf node into a JSON string. 
>The only entities I can convert are container or list.
Is this correct?

I got this from:
private SchemaTracker(final SchemaContext context, final SchemaPath path) {
SchemaNode current = SchemaUtils.findParentSchemaOnPath(context, path);
Preconditions.checkArgument(current instanceof 
DataNodeContainer,"Schema path must point to container or list or an rpc 
input/output. Supplied path %s pointed to: %s",path,current);
root = (DataNodeContainer) current;
}

When trying to create JSONNormalizedNodeStreamWriter

-Original Message-
From: Robert Varga [mailto:n...@hq.sk]
Sent: Sunday, July 24, 2016 4:52 PM
To: Sela, Guy ; controller-dev@lists.opendaylight.org; 
mdsal-...@lists.opendaylight.org
Subject: Re: [mdsal-dev] Serialize/Deserialize DTOs to JSON

On 07/24/2016 01:36 PM, Sela, Guy wrote:
> Hi,
> 
> Is there an API that I can call which gets a DataObject as input and 
> returns a JSON representation of it?

Not directly, as DataObject and related generated code is an access facade. 
Primary data representation is NormalizedNode, hence you need to transform the 
DTOs to NormalizedNodes (see mdsal-binding-dom-codec) and then use 
yang-data-codec-gson (or -xml) to get the representation you seek.

I think TTPUtils (in TTP) does exactly that.

Bye,
Robert

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] Clarification on Clusterwide-Singleton and inJeopardy behavior

2016-08-19 Thread Muthukumaran K
Hi,

While looking at ClusterSingletonServiceGroupImpl#lostOwnership on step-down 
part, I had a related doubt

So, as long as inJeopardy situation does not recover, we do not have any 
instance of singleton running across the cluster or in majority-cluster (2 
nodes out of 3) one instance can potentially get bootstrapped as singleton ?

Just trying to understand the behavior of Clusterwide Singletons in context of 
inJeopardy - apart from relinquishing current Ownership

Regards
Muthu



___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] OpenDaylight inventory Yang model status

2016-08-14 Thread Muthukumaran K
Hi Nathan, 

The idea was to use  a generic standards-based topology model when compared to 
opendaylight inventory model. This way individual southbound modules can use 
unified model. 

There was a dichotomy in terms of which standard topology model to choose. 
Actually, there are two models to consider for this migration - [1] and [2]. 
Decision of which model to choose across the projects and still meet release 
schedules  posed few challenges as discussed in [3]

[1] https://tools.ietf.org/html/draft-clemm-netmod-yang-network-topo-01
[2] https://tools.ietf.org/html/draft-ietf-i2rs-yang-network-topo-01
[3] https://lists.opendaylight.org/pipermail/tsc/2015-September/003837.html


Regards
Muthu




-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Nathan 
Sowatskey
Sent: Sunday, August 14, 2016 12:37 PM
To: controller-dev
Subject: [controller-dev] OpenDaylight inventory Yang model status

Hi

I am trying to explain a slice through the ODL stack based on the DLUX Nodes 
application and the opendaylight-inventory:nodes resource RESTCONF API. This is 
part of a training exercise that I am developing.

I can see that the opendaylight-inventory:nodes RESTCONF API is used by the 
DLUX Nodes application, so I am guessing that the API is current and relevant.

The Nodes code that uses the opendaylight-inventory:nodes is:

dlux/modules/node-resources/src/main/resources/node/nodes.services.js

node.factory('NodeInventorySvc', function(NodeRestangular) {
var svc = {
  base: function() {
return 
NodeRestangular.one('restconf').one('operational').one('opendaylight-inventory:nodes');
  },
  data : null
};

When I am looking at the Yang models for opendaylight-inventory and 
netconf-node-inventory, though, I see “status deprecated” everywhere. That 
makes me suspect that this model, and so the APIs based on it, are not current 
and relevant, or are planned to become not so.

The Yang file I am looking at is: 
controller/opendaylight/model/model-inventory/src/main/yang/opendaylight-inventory.yang
 

typedef support-type {
status deprecated;

typedef node-id {
status deprecated;

and so on.

I know that this is all in flux and so on, but could anyone shed any light on 
the actual and proposed status here please?

Many thanks

Nathan
—
Nathan John Sowatskey
Consulting Engineer - Programmable Infrastructure, DevOps, IoT and SDN 
nat...@nathan.to www.linkedin.com/in/nathandevops
XMPP: nathando...@im.koderoot.net
Google: nathanjohnsowats...@gmail.com
Skype: nathan_sowatskey
Twitter: NathanDotTo
GitHub: https://github.com/DevOps4Networks
http://www.kipling.org.uk/poems_if.htm

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev
___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


Re: [controller-dev] [mdsal-dev] Serialize/Deserialize DTOs to JSON

2016-08-11 Thread Muthukumaran K
Hi Sela, 

We had some wrappers for Flow/GroupEntity for similar purposes. Such wrappers 
can be seen in Genius MD-SAL Utils. I had experimented with these wrappers and 
Kryo and it worked perfectly without any issues. 

My experiments were mainly around separating apps from OFplugin into separate 
JVMs. Of course, that was just a personal experiment nothing official :-)

Regards
Muthu


-Original Message-
From: Sela, Guy [mailto:guy.s...@hpe.com] 
Sent: Thursday, August 11, 2016 1:32 PM
To: Muthukumaran K; Robert Varga; controller-dev@lists.opendaylight.org; 
mdsal-...@lists.opendaylight.org
Subject: RE: [mdsal-dev] Serialize/Deserialize DTOs to JSON

Just a quick update about this issue.
I'm still using the DTOs and not the NormalizedNode because I'm working on a 
demo.
With the correct strategy in KRYO, you can pass the DTOs themselves and not the 
Builders:
kryo.setInstantiatorStrategy(new Kryo.DefaultInstantiatorStrategy(new 
StdInstantiatorStrategy()));

This still won't work if you just send the DTO.
But, if you wrap it in your own "message" Object, which exposes a no-args 
constructor, you can send the DTO as is.

-Original Message-
From: Sela, Guy
Sent: Tuesday, July 26, 2016 1:01 PM
To: 'Muthukumaran K' ; Robert Varga ; 
controller-dev@lists.opendaylight.org; mdsal-...@lists.opendaylight.org
Subject: RE: [mdsal-dev] Serialize/Deserialize DTOs to JSON

Can you elaborate of what you exactly mean by creating a codec per DTO?

In my usecase, I want to send an instance of the DTO "X" between sites. 
* I create a new XBuilder initialized with the instance of X.
* I use KRYO to serialize the XBuilder on Site 1.
* Send it to Site 2.
* Site 2 expects to receive an XBuilder, so it just uses KRYO to deserialize 
the byte array into XBuilder.
* Calls build() on the XBuilder
* Gets a cloned instance of X.



-----Original Message-
From: Muthukumaran K [mailto:muthukumara...@ericsson.com]
Sent: Tuesday, July 26, 2016 9:43 AM
To: Sela, Guy ; Robert Varga ; 
controller-dev@lists.opendaylight.org; mdsal-...@lists.opendaylight.org
Subject: RE: [mdsal-dev] Serialize/Deserialize DTOs to JSON

Hi Guy, 

You are correct. That was a quick dirty hack I did for a demo. As I said, this 
would not scale as we have to create codec for every binding-aware DTO. That's 
why I dropped the idea. 

Regards
Muthu



-Original Message-
From: Sela, Guy [mailto:guy.s...@hpe.com]
Sent: Monday, July 25, 2016 12:45 PM
To: Muthukumaran K; Robert Varga; controller-dev@lists.opendaylight.org; 
mdsal-...@lists.opendaylight.org
Subject: RE: [mdsal-dev] Serialize/Deserialize DTOs to JSON

Quick guess: Did you just pass the Builders instead?


-Original Message-
From: mdsal-dev-boun...@lists.opendaylight.org 
[mailto:mdsal-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Monday, July 25, 2016 10:12 AM
To: Muthukumaran K ; Robert Varga ; 
controller-dev@lists.opendaylight.org; mdsal-...@lists.opendaylight.org
Subject: Re: [mdsal-dev] Serialize/Deserialize DTOs to JSON

How did you manage to work with Kryo given the fact that the DTOs don't offer a 
no-args constructor?


-Original Message-
From: Muthukumaran K [mailto:muthukumara...@ericsson.com]
Sent: Monday, July 25, 2016 8:56 AM
To: Sela, Guy ; Robert Varga ; 
controller-dev@lists.opendaylight.org; mdsal-...@lists.opendaylight.org
Subject: RE: [mdsal-dev] Serialize/Deserialize DTOs to JSON

As a naïve experiment, I had tried Kryo serialization. It did work for basic 
serialization and deserialization. But that was nothing serious. There are some 
dangers in using Kryo and hence dropped the idea. 

Regards
Muthu


-Original Message-
From: controller-dev-boun...@lists.opendaylight.org 
[mailto:controller-dev-boun...@lists.opendaylight.org] On Behalf Of Sela, Guy
Sent: Sunday, July 24, 2016 8:36 PM
To: Robert Varga; controller-dev@lists.opendaylight.org; 
mdsal-...@lists.opendaylight.org
Subject: Re: [controller-dev] [mdsal-dev] Serialize/Deserialize DTOs to JSON

Thanks Robert.

And what about just serialize/deserialize DTOs in an efficient way? 
(ProtoBuffs?) Is there something like that implemented?
 

-Original Message-
From: Robert Varga [mailto:n...@hq.sk]
Sent: Sunday, July 24, 2016 4:52 PM
To: Sela, Guy ; controller-dev@lists.opendaylight.org; 
mdsal-...@lists.opendaylight.org
Subject: Re: [mdsal-dev] Serialize/Deserialize DTOs to JSON

On 07/24/2016 01:36 PM, Sela, Guy wrote:
> Hi,
> 
> Is there an API that I can call which gets a DataObject as input and 
> returns a JSON representation of it?

Not directly, as DataObject and related generated code is an access facade. 
Primary data representation is NormalizedNode, hence you need to transform the 
DTOs to NormalizedNodes (see mdsal-binding-dom-codec) and then use 
yang-data-codec-gson (or -xml) to get the representation you seek.

I think TTPUtils (in TTP) does exactly that.

Bye,
Robert

___

Re: [controller-dev] What is the condition under which IllegalStateException - "store tree and candidate base differ" is thrown

2016-08-05 Thread Muthukumaran K
Sure Tom. I have added myself as reviewer. Read up on Noop described in thesis 
paper and self and Faiz had a detailed discussion today morning.

Overall the patch looks functionally complete. Will be going into detailed 
review.

Regards
Muthu



From: Tom Pantelis [mailto:tompante...@gmail.com]
Sent: Thursday, August 04, 2016 11:30 PM
To: Muthukumaran K
Cc: controller-dev@lists.opendaylight.org
Subject: Re: [controller-dev] What is the condition under which 
IllegalStateException - "store tree and candidate base differ" is thrown

It means preCommit/commit occurred out-of-order, eg

preCommit txn 1
preCommit txn 2
commit txn 2
commit txn 1

It's an internal problem in the CDS. We put a hack in Be to workaround it. This 
patch in master https://git.opendaylight.org/gerrit/#/c/42728/ explains the 
scenario and implements a proper solution.

Muthu - feel free to review as I can't add you :)


On Thu, Aug 4, 2016 at 12:29 PM, Muthukumaran K 
mailto:muthukumara...@ericsson.com>> wrote:
Hi,

I want to understand under which condition, this exception seen in 
InMemoryDataTree.java#commit is thrown ?

Does this exception mean that a child is being added by one Txn for a parent 
which could have been deleted another Txn ?

I checked related bug in OVSDB – 
https://bugs.opendaylight.org/show_bug.cgi?id=5062 which shows similar stack 
trace but could not conclude the condition of this exception

Regards
Muthu


___
controller-dev mailing list
controller-dev@lists.opendaylight.org<mailto:controller-dev@lists.opendaylight.org>
https://lists.opendaylight.org/mailman/listinfo/controller-dev

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev


[controller-dev] What is the condition under which IllegalStateException - "store tree and candidate base differ" is thrown

2016-08-04 Thread Muthukumaran K
Hi,

I want to understand under which condition, this exception seen in 
InMemoryDataTree.java#commit is thrown ?

Does this exception mean that a child is being added by one Txn for a parent 
which could have been deleted another Txn ?

I checked related bug in OVSDB - 
https://bugs.opendaylight.org/show_bug.cgi?id=5062 which shows similar stack 
trace but could not conclude the condition of this exception

Regards
Muthu

___
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev