Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?
Sjoerd, Guillaume, Simon, What does proper notification mean? Which are the cases that it happens? Probably this is not if an XO moves slowly to a place with poor connectivity. In the case of a temporary(short) disruption of connectictivity, how much time does it generally take for it to return? You mentioned that in the past XOs were appearing and disappearing constantly. This implies that the common drop of connectivity is in the scale of few seconds. If it is lost for more than a few minutes, than it is not bad for the XO to leave and return. So I believe that 1h or even 10min are too long timeouts. There are a couple more things I would like to address: 1. Is there a way to restart the presence service? In that way we can resolve a weird state. Will killing restarting the porcess work? 2. At what point in the source code, the presence serivce i.will try to connect to the jabber server? ii. run gabble? 3. I noticed the dbus diagram is updated. Indeed we have a better picture of whats happening. But, still we need some more information like: i. state diagram of the presence service ii. what type of communication is taking place between NM and PS iii. when connection is switched from linklocal to schoolserver(for example) what steps are taking place in the presence service iv. the internet connectivity is detected by NM and sent to PS, or detected by PS yani On 10/30/07, Sjoerd Simons [EMAIL PROTECTED] wrote: On Fri, Oct 26, 2007 at 02:48:55PM -0400, Giannis Galanis wrote: Sjoerd, I would like to ask you, you replied at one of the bugs: Moving from a bugreport to a private mail might not be a great idea.. Could you in the future just put your questions in the bugreport so we can have the discussion in a more public fashion :) Salut used to drop the presence of people for which it couldn't resolve the extra information, but this seemed to give a lot of problems in the mesh (people appearing and disappearing all the time). So as a workaround we switched to only dropping presence iff all info about a node has gone. Which has the downside the nodes that are really gone can still appear on the mesh view for some time (specifically when they didn't send a proper mdns bye packet or when that was dropped). iff all info about a node has gone what does this mean? It means that it is hard to decide when a node has really gone or if the network link to a certain node is just (temporarily) bad. In the OLPC office, the second case apparently happens a lot. how often do you refresh? The refresh is done by avahi. Avahi tries every few minutes. Guillame worked on a patch to make the effect of being unsure about a user less bad (As in assume that if your unsure about for a certain period of time their actually really gone).. It still needs to be finished though. Which means for an end-users point of view, that if a user went away without doing proper notification, then they will only stay on the meshview for a limited amount of time (Say maximum of 10 minutes instead of the current situation of more then an hour) Sjoerd -- Kindness is the beginning of cruelty. -- Muad'dib [Frank Herbert, Dune] ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?
Giannis Galanis wrote: 1. Is there a way to restart the presence service? In that way we can resolve a weird state. Will killing restarting the porcess work? Killing it will result in it being restarted. However, Sugar remains in an inconsistent state, with buddies stuck on the mesh view if they left while PS was not running, since PS never told Sugar that they left. 2. At what point in the source code, the presence serivce i.will try to connect to the jabber server? If the server plugin detects that NM has an IP address. See below for details. ii. run gabble? The connection manager plugins are both started at startup, unless PRESENCE_SERVICE_DEBUG=disable-gabble or disable-salut are set. 3. I noticed the dbus diagram is updated. Indeed we have a better picture of whats happening. But, still we need some more information like: i. state diagram of the presence service PS was designed to run with both plugins (which talk to gabble and salut) running concurrently. However due to confusion about server and link local activities displayed on the same mesh view with no differentiation, and problems in connecting to some shared activities you can see presence for, that was disabled in the run up to Trial3. * Both plugins are started when PS starts. * Salut succeeds faster than Gabble, so link local buddies are shown first. * If PS gets an IP address from NM, it starts the server plugin (gabble). * When the server plugin has started, PS stops the link local plugin (salut). Link local buddies disappear. * If NM loses the IP address, the server plugin stops itself. * When the server plugin stops, PS starts the link local plugin. Is that sufficient for your state diagram? There's a lot more than happens, with async calls, so a complete state diagram would be rather complex. ii. what type of communication is taking place between NM and PS iii. when connection is switched from linklocal to schoolserver(for example) what steps are taking place in the presence service iv. the internet connectivity is detected by NM and sent to PS, or detected by PS PS's server plugin watches NM signals on the D-Bus system bus. Regards Morgan ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Compiler optimization for Geode, migrating to F8, build system
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Mon, 05 Nov 2007 at 15:13:39 -0500, Bernardo Innocenti wrote: Yes. The problem is that building a package or two in a separate environment is feasible, but rebuilding a whole distribution from scratch is *hard*. It requires you to install and configure local copies of pilgrim, mock, createrepo, yum... If this sort of thing is of interest to people, it's likely to be worth examining how Debian does it - there are regular efforts to rebuild all of Debian, usually for mass-bug-filing purposes (e.g. filing bugs against all packages that don't build correctly with a newer version of gcc, so that those bugs can be fixed before the newer version becomes the default). Because of Debian's decentralized nature, much of its infrastructure is in a form that can be used to set up third-party repositories without any particular permission or help from Debian itself. In particular, Debian's debootstrap package (used to construct a minimal chroot with a particular version of Debian) is so useful that it was adopted for use as the first stage of the official installer (the installer now uses a C reimplementation, cdebootstrap, but the principle is the same), and the schroot and sbuild packages (used to set up a chroot, and build packages in a chroot with automatic installation of build dependencies, respectively) are also very well constructed. http://buildd.debian.org/ runs the official package autobuilders, but there are a number of third-party clones running the same software, such as http://experimental.ftbfs.de/. Simon -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: OpenPGP key: http://www.pseudorandom.co.uk/2003/contact/ or pgp.net iD8DBQFHMEtTWSc8zVUw7HYRAn/oAJ4nPq/cM9jWfuzBwWmpgmN6F78KaACgr26O dt2azi6/rgiRedVIrPQy7fI= =2eAt -END PGP SIGNATURE- ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 In reply to your previous mail, iff means if and only if. It's often used by mathematicians. On Tue, 06 Nov 2007 at 03:23:39 -0500, Giannis Galanis wrote: What does proper notification mean? Which are the cases that it happens? If Salut is explicitly asked to disconnect, it will tell Avahi to delete all its mDNS records (this actually consists of re-sending all the records it was advertising, with the Time To Live set to 0 seconds). This is sometimes referred to as a goodbye packet. See http://files.multicastdns.org/draft-cheshire-dnsext-multicastdns.txt section 11.2 Goodbye Packets. The only time we'll currently do this is when switching off Salut because Gabble has connected successfully. Probably this is not if an XO moves slowly to a place with poor connectivity. This is never done in response to network conditions - we can't know that we've lost network connectivity until it's too late. If the Time To Live on our mDNS records expires, that should have the same effect; however, as Sjoerd explained, we currently ignore that, because the 1CC mesh network is apparently unstable enough that the TTL sometimes expires even for laptops that are actually present. In the case of a temporary(short) disruption of connectictivity, how much time does it generally take for it to return? You mentioned that in the past XOs were appearing and disappearing constantly. This implies that the common drop of connectivity is in the scale of few seconds. You tell me! :-) I don't have enough XOs to replicate the conditions of a large mesh network like 1CC, so I can't comment on packet loss rates. Perhaps Dan Williams (who used to maintain Presence Service) could help you. If it is lost for more than a few minutes, than it is not bad for the XO to leave and return. So I believe that 1h or even 10min are too long timeouts. I believe we're currently using Avahi's default timeouts, which are those recommended in the mDNS draft (linked above). If I'm right about that, then we're using 120 second TTLs for the SRV and A records. Assuming Salut and Avahi follow the draft's recommendations, this means that for the records representing activities, buddies and laptops, if we haven't seen an annoucement of a particular record, we will: - - re-query after 96 - 98.4 seconds; - - if no reply, re-query after 102 - 104.4 seconds; - - if no reply, re-query after 114 - 116.4 seconds; - - if no reply, assume the record has vanished after 120 seconds. (In each of the ranges given for the re-queries, the exact time is chosen at random, to avoid simultaneous queries from everyone in the network.) The timeout is reset as soon as we see any announcement of a record. The only ones whose disappearance matters are the SRV and A records - if a TXT record fails to disappear when it shouldn't, we don't really care. TXT records have a substantially longer timeout (the draft recommends 75 minutes). There are a couple more things I would like to address: 1. Is there a way to restart the presence service? In that way we can resolve a weird state. Will killing restarting the porcess work? Only if client code that accesses the PS is amended to cope with this (I just filed #4681 to represent this). Until #4681 is closed, if the PS was restarted, nothing would work - use Ctrl+Alt+Backspace to restart all of Sugar. Please see the bug for more details or to reply. 2. At what point in the source code, the presence serivce i.will try to connect to the jabber server? ii. run gabble? I'll answer (ii.) first. Gabble is automatically run by the session bus (dbus-daemon) via service activation, the first time the Presence Service uses it, if it isn't already running. So there is no explicit code in the PS to run Gabble. OK, now (i.): When Network Manager indicates that we have a valid IP address, we run the _init_connection method of the ServerPlugin instance. If the Gabble connection fails, we schedule a timer (currently 5 seconds) and retry running _init_connection when the timer runs out. (classes TelepathyPlugin and ServerPlugin, methods _init_connection, _reconnect_cb, _could_connect, _handle_connection_status_change.) What _init_connection does is: If there's already a Gabble connection and it's connected, it'll be used. (class ServerPlugin, method _find_existing_connection). Otherwise we make a new connection (method _make_new_connection). ServerPlugin (src/server_plugin.py) inherits from TelepathyPlugin (src/telepathy_plugin.py) so some of the methods I mentioned are defined in TelepathyPlugin, some in ServerPlugin, and some are defined in TelepathyPlugin but overridden in ServerPlugin. ii. what type of communication is taking place between NM and PS D-Bus messages, on the system bus. iv. the internet connectivity is detected by NM and sent to PS, or detected by PS Internet connectivity isn't really detected, as such. The PS listens for signals from Network Manager that
Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?
On Tue, 06 Nov 2007 at 12:05:59 +0200, Morgan Collett wrote: Giannis Galanis wrote: 1. Is there a way to restart the presence service? In that way we can resolve a weird state. Will killing restarting the porcess work? Killing it will result in it being restarted. However, Sugar remains in an inconsistent state, with buddies stuck on the mesh view if they left while PS was not running, since PS never told Sugar that they left. The situation would actually be worse than that, see #4681. Simon ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
xf86eqEnqueue
Hi everybody! Does anyone has any idea of why we have this console message? SGIO not blocked at xt86eqEnqueue This seem to happen in many (all?) builds I've been working with. Thanks, Ricardo Carrano ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?
Thank you all for your replies. They clear the picture a lot. To summarize: 1. We need to fix the timeout for icons to disappear. Can we try Guillaume's patch? Also we need to be able to resolve which icons are currently not avaiable(but still appearing). I believe that failed entries in _precense._tcp is a complete list. Is this correct? 2. We need to be able to restart PS. As you say this is not possible, but if we restart sugar will PS restart as well? 3. We need to force gabble to run. We have several instances of 4193 (almost all XOs connected to schoolserver,AP are running salut). Or at least to force trying to connect to jabber server. 4. The process of trying to connect to the jabber server, is done by telepathy-gabble, or by the presence On 11/6/07, Simon McVittie [EMAIL PROTECTED] wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 In reply to your previous mail, iff means if and only if. It's often used by mathematicians. On Tue, 06 Nov 2007 at 03:23:39 -0500, Giannis Galanis wrote: What does proper notification mean? Which are the cases that it happens? If Salut is explicitly asked to disconnect, it will tell Avahi to delete all its mDNS records (this actually consists of re-sending all the records it was advertising, with the Time To Live set to 0 seconds). This is sometimes referred to as a goodbye packet. See http://files.multicastdns.org/draft-cheshire-dnsext-multicastdns.txt section 11.2 Goodbye Packets. The only time we'll currently do this is when switching off Salut because Gabble has connected successfully. Probably this is not if an XO moves slowly to a place with poor connectivity. This is never done in response to network conditions - we can't know that we've lost network connectivity until it's too late. If the Time To Live on our mDNS records expires, that should have the same effect; however, as Sjoerd explained, we currently ignore that, because the 1CC mesh network is apparently unstable enough that the TTL sometimes expires even for laptops that are actually present. In the case of a temporary(short) disruption of connectictivity, how much time does it generally take for it to return? You mentioned that in the past XOs were appearing and disappearing constantly. This implies that the common drop of connectivity is in the scale of few seconds. You tell me! :-) I don't have enough XOs to replicate the conditions of a large mesh network like 1CC, so I can't comment on packet loss rates. Perhaps Dan Williams (who used to maintain Presence Service) could help you. If it is lost for more than a few minutes, than it is not bad for the XO to leave and return. So I believe that 1h or even 10min are too long timeouts. I believe we're currently using Avahi's default timeouts, which are those recommended in the mDNS draft (linked above). If I'm right about that, then we're using 120 second TTLs for the SRV and A records. Assuming Salut and Avahi follow the draft's recommendations, this means that for the records representing activities, buddies and laptops, if we haven't seen an annoucement of a particular record, we will: - - re-query after 96 - 98.4 seconds; - - if no reply, re-query after 102 - 104.4 seconds; - - if no reply, re-query after 114 - 116.4 seconds; - - if no reply, assume the record has vanished after 120 seconds. (In each of the ranges given for the re-queries, the exact time is chosen at random, to avoid simultaneous queries from everyone in the network.) The timeout is reset as soon as we see any announcement of a record. The only ones whose disappearance matters are the SRV and A records - if a TXT record fails to disappear when it shouldn't, we don't really care. TXT records have a substantially longer timeout (the draft recommends 75 minutes). There are a couple more things I would like to address: 1. Is there a way to restart the presence service? In that way we can resolve a weird state. Will killing restarting the porcess work? Only if client code that accesses the PS is amended to cope with this (I just filed #4681 to represent this). Until #4681 is closed, if the PS was restarted, nothing would work - use Ctrl+Alt+Backspace to restart all of Sugar. Please see the bug for more details or to reply. 2. At what point in the source code, the presence serivce i.will try to connect to the jabber server? ii. run gabble? I'll answer (ii.) first. Gabble is automatically run by the session bus (dbus-daemon) via service activation, the first time the Presence Service uses it, if it isn't already running. So there is no explicit code in the PS to run Gabble. OK, now (i.): When Network Manager indicates that we have a valid IP address, we run the _init_connection method of the ServerPlugin instance. If the Gabble connection fails, we schedule a timer (currently 5 seconds) and retry running _init_connection when the timer runs out. (classes
Software status meeting on IRC (today, 21:00 EST Boston)
Hi, We'll be having the regular software meeting on IRC (irc.freenode.net #olpc-meeting) tonight at 9pm EST. See you there! Date/time (note daylight-savings has finished now): http://www.timeanddate.com/worldclock/fixedtime.html?month=11day=06year=2007hour=21min=0sec=0p1=43 URL for update-1 blockers by owner: http://tinyurl.com/2fuzpv - Chris. -- Chris Ball [EMAIL PROTECTED] ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Activities downgraded in Joyride 247
See http://dev.laptop.org/~bert/joyride-pkgs.html Many activities were downgraded to older versions. - Bert - ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Activities downgraded in Joyride 247
On 11/6/07, Bert Freudenberg [EMAIL PROTECTED] wrote: See http://dev.laptop.org/~bert/joyride-pkgs.html Many activities were downgraded to older versions. Seems like I manually kicked off a build which stomped on a build-in-progress. Oops. =( I'm working on improvements to the process. --scott -- ( http://cscott.net/ ) ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Bitfrost Activity Isolation being turn ON.
Dear @sugar and @devel, The sugar, security, and release teams have agreed that the file-system isolation features provided by Rainbow [1, 2] are close enough to maturity to be turned on by default in Joyride. Unfortunately, there will be serious short-term regressions caused by this change -- for example, activities will not receive any data when they are resumed until a blocking bug in datastore/Rainbow integration [#3801] is resolved. We are committed to fixing these regressions in a timely fashion. However, for this to happen, we need your help to find and diagnose them. Please try to run the software through its paces so that we can make informed decisions about how much isolation we can actually ship. Also, please be prepared to update to the latest joyride snapshot on a frequent basis as we discover and fix the worst regressions. Finally, in order to help you determine whether bugs that you observe are being caused by an interaction with isolation, you need to know the procedure for turning it on and off. TO TURN OFF ISOLATION: From a console, logged in as root: rm /etc/olpc-security reboot TO RE-ENABLE ISOLATION: From a console, logged in as root: touch /etc/olpc-security reboot [It actually suffices to restart X after modifying /etc/olpc-security, but doing a full reboot will let you start off from a known good state.] Thanks, Michael, Marco, Tomeu, Ivan, Kim, Jim, and Walter. [1]: http://wiki.laptop.org/go/Rainbow-- overview page [2]: http://wiki.laptop.org/go/Taste_the_Rainbow -- source code tour ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Bitfrost Activity Isolation being turn ON.
Subject: Bitfrost Activity Isolation being turn ON. BITFROST TURN ON. LAUNCH EVERY ZIG. MAKE YOUR TIME. :-) --elijah ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Shutting down fds prior to execvpe in rainbow/inject.py: joyride 247 under Qemu
Marcus Leech writes: I experimentally put some code just before the execvpe() in inject.py to close FDs = 3 and = 10. I picked 10 out of the air, but I wouldn't expect there to be many open file descriptors at that point. Actually, given the semantics of dup(), you could use it to probe what the maximum FD number is just before execvpe(), so the terminating condition could be something like = dup(0). I don't see how dup() would help you. Remember, you could get back fd 123 even if fd 12345 was the last one allocated and is still in use. You get the lowest free fd. You can do readdir() on /proc/self/fd to list them, being careful to not close the fd used for reading the directory until you have read the whole directory. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel