Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-06 Thread Giannis Galanis
Sjoerd, Guillaume, Simon,

What does proper notification mean? Which are the cases that it happens?

Probably this is not if an XO moves slowly to a place with poor
connectivity.

In the case of a temporary(short) disruption of connectictivity, how much
time does it generally take for it to return? You mentioned that in the past
XOs were appearing  and disappearing constantly. This implies that the
common drop of connectivity is in the scale of few seconds. If it is lost
for more than a few minutes, than it is not bad for the XO to leave and
return.  So I believe that 1h or even 10min are too long timeouts.

There are a couple more things I would like to address:

1. Is there a way to restart the presence service? In that way we can
resolve a weird state. Will killing restarting the porcess work?

2. At what point in the source code, the presence serivce
i.will try to connect to the jabber server?
ii. run gabble?

3. I noticed the dbus diagram is updated. Indeed we have a better picture of
whats happening. But, still we need some more information like:
i. state diagram of the presence service
ii. what type of communication is taking place between NM and PS
iii. when connection is switched from linklocal to schoolserver(for example)
what steps are taking place in the presence service
iv. the internet connectivity is detected by NM and sent to PS, or detected
by PS

yani




On 10/30/07, Sjoerd Simons [EMAIL PROTECTED]  wrote:

 On Fri, Oct 26, 2007 at 02:48:55PM -0400, Giannis Galanis wrote:
   Sjoerd,
 
  I would like to ask you,
 
  you replied at one of the bugs:

 Moving from a bugreport to a private mail might not be a great idea..
 Could you
 in the future just put your questions in the bugreport so we can have the
 discussion in a more public fashion :)

  Salut used to drop the presence of people for which it couldn't resolve
 the
  extra information, but this seemed to give a lot of problems in the mesh

  (people appearing and
  disappearing all the time). So as a workaround we switched to only
 dropping
  presence iff all info about a node has gone. Which has the downside the
  nodes that are really
  gone can still appear on the mesh view for some time (specifically when
  they didn't send a proper mdns bye packet or when that was dropped).
 
  iff all info about a node has gone
  what does this mean?

 It means that it is hard to decide when a node has really gone or if the
 network link to a certain node is just (temporarily) bad.

 In the OLPC office, the second case apparently happens a lot.

  how often do you refresh?

 The refresh is done by avahi. Avahi tries every few minutes. Guillame
 worked on
 a patch to make the effect of being unsure about a user less bad (As in
 assume
 that if your unsure about for a certain period of time their actually
 really
 gone).. It still needs to be finished though.

 Which means for an end-users point of view, that if a user went away
 without
 doing proper notification, then they will only stay on the meshview for a
 limited amount of time (Say maximum of 10 minutes instead of the current
 situation of more then an hour)



  Sjoerd
 --
 Kindness is the beginning of cruelty.
 -- Muad'dib [Frank Herbert, Dune]

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-06 Thread Morgan Collett
Giannis Galanis wrote:
 1. Is there a way to restart the presence service? In that way we can
 resolve a weird state. Will killing restarting the porcess work?

Killing it will result in it being restarted. However, Sugar remains in
an inconsistent state, with buddies stuck on the mesh view if they left
while PS was not running, since PS never told Sugar that they left.

 2. At what point in the source code, the presence serivce
 i.will try to connect to the jabber server?

If the server plugin detects that NM has an IP address. See below for
details.

 ii. run gabble?

The connection manager plugins are both started at startup, unless
PRESENCE_SERVICE_DEBUG=disable-gabble or disable-salut are set.

 3. I noticed the dbus diagram is updated. Indeed we have a better
 picture of whats happening. But, still we need some more information like:
 i. state diagram of the presence service

PS was designed to run with both plugins (which talk to gabble and
salut) running concurrently. However due to confusion about server and
link local activities displayed on the same mesh view with no
differentiation, and problems in connecting to some shared activities
you can see presence for, that was disabled in the run up to Trial3.

* Both plugins are started when PS starts.
* Salut succeeds faster than Gabble, so link local buddies are shown first.
* If PS gets an IP address from NM, it starts the server plugin (gabble).
* When the server plugin has started, PS stops the link local plugin
(salut). Link local buddies disappear.
* If NM loses the IP address, the server plugin stops itself.
* When the server plugin stops, PS starts the link local plugin.

Is that sufficient for your state diagram? There's a lot more than
happens, with async calls, so a complete state diagram would be rather
complex.

 ii. what type of communication is taking place between NM and PS
 iii. when connection is switched from linklocal to schoolserver(for
 example) what steps are taking place in the presence service
 iv. the internet connectivity is detected by NM and sent to PS, or
 detected by PS

PS's server plugin watches NM signals on the D-Bus system bus.

Regards
Morgan

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Compiler optimization for Geode, migrating to F8, build system

2007-11-06 Thread Simon McVittie
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, 05 Nov 2007 at 15:13:39 -0500, Bernardo Innocenti wrote:
 Yes.  The problem is that building a package or two in a separate
 environment is feasible, but rebuilding a whole distribution from
 scratch is *hard*.  It requires you to install and configure local
 copies of pilgrim, mock, createrepo, yum...

If this sort of thing is of interest to people, it's likely to be worth
examining how Debian does it - there are regular efforts to rebuild all
of Debian, usually for mass-bug-filing purposes (e.g. filing bugs
against all packages that don't build correctly with a newer version
of gcc, so that those bugs can be fixed before the newer version becomes
the default).

Because of Debian's decentralized nature, much of its infrastructure is
in a form that can be used to set up third-party repositories without
any particular permission or help from Debian itself.

In particular, Debian's debootstrap package (used to construct a minimal
chroot with a particular version of Debian) is so useful that it was
adopted for use as the first stage of the official installer (the
installer now uses a C reimplementation, cdebootstrap, but the principle
is the same), and the schroot and sbuild packages (used to set up a
chroot, and build packages in a chroot with automatic installation of
build dependencies, respectively) are also very well constructed.

http://buildd.debian.org/ runs the official package autobuilders, but
there are a number of third-party clones running the same software, such
as http://experimental.ftbfs.de/.

Simon
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: OpenPGP key: http://www.pseudorandom.co.uk/2003/contact/ or pgp.net

iD8DBQFHMEtTWSc8zVUw7HYRAn/oAJ4nPq/cM9jWfuzBwWmpgmN6F78KaACgr26O
dt2azi6/rgiRedVIrPQy7fI=
=2eAt
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-06 Thread Simon McVittie
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

In reply to your previous mail, iff means if and only if. It's often
used by mathematicians.

On Tue, 06 Nov 2007 at 03:23:39 -0500, Giannis Galanis wrote:
 What does proper notification mean? Which are the cases that it happens?

If Salut is explicitly asked to disconnect, it will tell Avahi to delete
all its mDNS records (this actually consists of re-sending all the
records it was advertising, with the Time To Live set to 0 seconds).
This is sometimes referred to as a goodbye packet. See
http://files.multicastdns.org/draft-cheshire-dnsext-multicastdns.txt
section 11.2 Goodbye Packets.

The only time we'll currently do this is when switching off Salut because
Gabble has connected successfully.

 Probably this is not if an XO moves slowly to a place with poor
 connectivity.

This is never done in response to network conditions - we can't know that
we've lost network connectivity until it's too late.

If the Time To Live on our mDNS records expires, that should have the same
effect; however, as Sjoerd explained, we currently ignore that, because
the 1CC mesh network is apparently unstable enough that the TTL
sometimes expires even for laptops that are actually present.

 In the case of a temporary(short) disruption of connectictivity, how much
 time does it generally take for it to return? You mentioned that in the past
 XOs were appearing  and disappearing constantly. This implies that the
 common drop of connectivity is in the scale of few seconds.

You tell me! :-) I don't have enough XOs to replicate the conditions of
a large mesh network like 1CC, so I can't comment on packet loss rates.
Perhaps Dan Williams (who used to maintain Presence Service) could help
you.

 If it is lost
 for more than a few minutes, than it is not bad for the XO to leave and
 return.  So I believe that 1h or even 10min are too long timeouts.

I believe we're currently using Avahi's default timeouts, which are
those recommended in the mDNS draft (linked above). If I'm right about
that, then we're using 120 second TTLs for the SRV and A records.

Assuming Salut and Avahi follow the draft's recommendations, this means
that for the records representing activities, buddies and laptops, if we
haven't seen an annoucement of a particular record, we will:

- - re-query after 96 - 98.4 seconds;
- - if no reply, re-query after 102 - 104.4 seconds;
- - if no reply, re-query after 114 - 116.4 seconds;
- - if no reply, assume the record has vanished after 120 seconds.

(In each of the ranges given for the re-queries, the exact time is
chosen at random, to avoid simultaneous queries from everyone in the
network.)

The timeout is reset as soon as we see any announcement of a record.

The only ones whose disappearance matters are the SRV and A records - if
a TXT record fails to disappear when it shouldn't, we don't really care.
TXT records have a substantially longer timeout (the draft recommends 75
minutes).

 There are a couple more things I would like to address:
 
 1. Is there a way to restart the presence service? In that way we can
 resolve a weird state. Will killing restarting the porcess work?

Only if client code that accesses the PS is amended to cope with this
(I just filed #4681 to represent this). Until #4681 is closed, if the PS
was restarted, nothing would work - use Ctrl+Alt+Backspace to restart all of
Sugar. Please see the bug for more details or to reply.

 2. At what point in the source code, the presence serivce
 i.will try to connect to the jabber server?
 ii. run gabble?

I'll answer (ii.) first. Gabble is automatically run by the session bus
(dbus-daemon) via service activation, the first time the Presence Service
uses it, if it isn't already running. So there is no explicit code in the PS
to run Gabble.

OK, now (i.):

When Network Manager indicates that we have a valid IP address, we run
the _init_connection method of the ServerPlugin instance. If the Gabble
connection fails, we schedule a timer (currently 5 seconds) and retry
running _init_connection when the timer runs out. (classes
TelepathyPlugin and ServerPlugin, methods _init_connection,
_reconnect_cb, _could_connect, _handle_connection_status_change.)

What _init_connection does is: If there's already a Gabble connection and it's
connected, it'll be used. (class ServerPlugin, method
_find_existing_connection). Otherwise we make a new connection (method
_make_new_connection).

ServerPlugin (src/server_plugin.py) inherits from TelepathyPlugin
(src/telepathy_plugin.py) so some of the methods I mentioned are defined
in TelepathyPlugin, some in ServerPlugin, and some are defined in
TelepathyPlugin but overridden in ServerPlugin.

 ii. what type of communication is taking place between NM and PS

D-Bus messages, on the system bus.

 iv. the internet connectivity is detected by NM and sent to PS, or detected
 by PS

Internet connectivity isn't really detected, as such. The PS listens for
signals from Network Manager that 

Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-06 Thread Simon McVittie
On Tue, 06 Nov 2007 at 12:05:59 +0200, Morgan Collett wrote:
 Giannis Galanis wrote:
  1. Is there a way to restart the presence service? In that way we can
  resolve a weird state. Will killing restarting the porcess work?
 
 Killing it will result in it being restarted. However, Sugar remains in
 an inconsistent state, with buddies stuck on the mesh view if they left
 while PS was not running, since PS never told Sugar that they left.

The situation would actually be worse than that, see #4681.

Simon
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


xf86eqEnqueue

2007-11-06 Thread Ricardo Carrano
Hi everybody!

Does anyone has any idea of why we have this console message?
SGIO not blocked at xt86eqEnqueue

This seem to happen in many (all?) builds I've been working with.

Thanks,
Ricardo Carrano

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: when an xo loses connection, how long does it take to disappear from other's neighbor view?

2007-11-06 Thread Giannis Galanis
Thank you all for your replies. They clear the picture a lot.

To summarize:

1. We need to fix the timeout for icons to disappear. Can we try Guillaume's
patch? Also we need to be able to resolve which icons are currently not
avaiable(but still appearing). I believe that failed entries in
_precense._tcp is a complete list. Is this correct?

2. We need to be able to restart PS. As you say this is not possible, but if
we restart sugar will PS restart as well?

3. We need to force gabble to run. We have several instances of 4193 (almost
all XOs connected to schoolserver,AP are running salut). Or at least to
force trying to connect to jabber server.

4. The process of trying to connect to the jabber server, is done by
telepathy-gabble, or by the presence

On 11/6/07, Simon McVittie  [EMAIL PROTECTED] wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 In reply to your previous mail, iff means if and only if. It's often
 used by mathematicians.

 On Tue, 06 Nov 2007 at 03:23:39 -0500, Giannis Galanis wrote:
  What does proper notification mean? Which are the cases that it happens?


 If Salut is explicitly asked to disconnect, it will tell Avahi to delete
 all its mDNS records (this actually consists of re-sending all the
 records it was advertising, with the Time To Live set to 0 seconds).
 This is sometimes referred to as a goodbye packet. See
 http://files.multicastdns.org/draft-cheshire-dnsext-multicastdns.txt
 section 11.2 Goodbye Packets.

 The only time we'll currently do this is when switching off Salut because
 Gabble has connected successfully.

  Probably this is not if an XO moves slowly to a place with poor
  connectivity.

 This is never done in response to network conditions - we can't know that
 we've lost network connectivity until it's too late.

 If the Time To Live on our mDNS records expires, that should have the same

 effect; however, as Sjoerd explained, we currently ignore that, because
 the 1CC mesh network is apparently unstable enough that the TTL
 sometimes expires even for laptops that are actually present.

  In the case of a temporary(short) disruption of connectictivity, how
 much
  time does it generally take for it to return? You mentioned that in the
 past
  XOs were appearing  and disappearing constantly. This implies that the
  common drop of connectivity is in the scale of few seconds.

 You tell me! :-) I don't have enough XOs to replicate the conditions of
 a large mesh network like 1CC, so I can't comment on packet loss rates.
 Perhaps Dan Williams (who used to maintain Presence Service) could help
 you.

  If it is lost
  for more than a few minutes, than it is not bad for the XO to leave and
  return.  So I believe that 1h or even 10min are too long timeouts.

 I believe we're currently using Avahi's default timeouts, which are
 those recommended in the mDNS draft (linked above). If I'm right about
 that, then we're using 120 second TTLs for the SRV and A records.

 Assuming Salut and Avahi follow the draft's recommendations, this means
 that for the records representing activities, buddies and laptops, if we
 haven't seen an annoucement of a particular record, we will:

 - - re-query after 96 - 98.4 seconds;
 - - if no reply, re-query after 102 - 104.4 seconds;
 - - if no reply, re-query after 114 - 116.4 seconds;
 - - if no reply, assume the record has vanished after 120 seconds.

 (In each of the ranges given for the re-queries, the exact time is
 chosen at random, to avoid simultaneous queries from everyone in the
 network.)

 The timeout is reset as soon as we see any announcement of a record.

 The only ones whose disappearance matters are the SRV and A records - if
 a TXT record fails to disappear when it shouldn't, we don't really care.
 TXT records have a substantially longer timeout (the draft recommends 75
 minutes).

  There are a couple more things I would like to address:
 
  1. Is there a way to restart the presence service? In that way we can
  resolve a weird state. Will killing restarting the porcess work?

 Only if client code that accesses the PS is amended to cope with this
 (I just filed #4681 to represent this). Until #4681 is closed, if the PS
 was restarted, nothing would work - use Ctrl+Alt+Backspace to restart all
 of
 Sugar. Please see the bug for more details or to reply.

  2. At what point in the source code, the presence serivce
  i.will try to connect to the jabber server?
  ii. run gabble?

 I'll answer (ii.) first. Gabble is automatically run by the session bus
 (dbus-daemon) via service activation, the first time the Presence Service
 uses it, if it isn't already running. So there is no explicit code in the
 PS
 to run Gabble.

 OK, now (i.):

 When Network Manager indicates that we have a valid IP address, we run
 the _init_connection method of the ServerPlugin instance. If the Gabble
 connection fails, we schedule a timer (currently 5 seconds) and retry
 running _init_connection when the timer runs out. (classes
 

Software status meeting on IRC (today, 21:00 EST Boston)

2007-11-06 Thread Chris Ball
Hi,

We'll be having the regular software meeting on IRC (irc.freenode.net
#olpc-meeting) tonight at 9pm EST.  See you there!

Date/time (note daylight-savings has finished now):
   
http://www.timeanddate.com/worldclock/fixedtime.html?month=11day=06year=2007hour=21min=0sec=0p1=43
   
URL for update-1 blockers by owner:
   http://tinyurl.com/2fuzpv

- Chris.
-- 
Chris Ball   [EMAIL PROTECTED]
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Activities downgraded in Joyride 247

2007-11-06 Thread Bert Freudenberg
See http://dev.laptop.org/~bert/joyride-pkgs.html

Many activities were downgraded to older versions.

- Bert -


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Activities downgraded in Joyride 247

2007-11-06 Thread C. Scott Ananian
On 11/6/07, Bert Freudenberg [EMAIL PROTECTED] wrote:
 See http://dev.laptop.org/~bert/joyride-pkgs.html

 Many activities were downgraded to older versions.

Seems like I manually kicked off a build which stomped on a
build-in-progress. Oops. =(

I'm working on improvements to the process.
 --scott

-- 
 ( http://cscott.net/ )
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Bitfrost Activity Isolation being turn ON.

2007-11-06 Thread Michael Stone
Dear @sugar and @devel,

The sugar, security, and release teams have agreed that the file-system
isolation features provided by Rainbow [1, 2] are close enough to
maturity to be turned on by default in Joyride.

Unfortunately, there will be serious short-term regressions caused by
this change -- for example, activities will not receive any data when
they are resumed until a blocking bug in datastore/Rainbow integration
[#3801] is resolved. 

We are committed to fixing these regressions in a timely fashion.
However, for this to happen, we need your help to find and diagnose
them. Please try to run the software through its paces so that we can
make informed decisions about how much isolation we can actually ship.
Also, please be prepared to update to the latest joyride snapshot on a
frequent basis as we discover and fix the worst regressions.

Finally, in order to help you determine whether bugs that you observe
are being caused by an interaction with isolation, you need to know the
procedure for turning it on and off.

TO TURN OFF ISOLATION:

  From a console, logged in as root:

rm /etc/olpc-security  reboot

TO RE-ENABLE ISOLATION:

  From a console, logged in as root:

touch /etc/olpc-security  reboot

[It actually suffices to restart X after modifying /etc/olpc-security,
but doing a full reboot will let you start off from a known good state.]

Thanks,

Michael, Marco, Tomeu, Ivan, Kim, Jim, and Walter.

[1]: http://wiki.laptop.org/go/Rainbow-- overview page
[2]: http://wiki.laptop.org/go/Taste_the_Rainbow  -- source code tour

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Bitfrost Activity Isolation being turn ON.

2007-11-06 Thread elw


 Subject: Bitfrost Activity Isolation being turn ON.


BITFROST TURN ON.

LAUNCH EVERY ZIG.

MAKE YOUR TIME.

:-)

--elijah

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Shutting down fds prior to execvpe in rainbow/inject.py: joyride 247 under Qemu

2007-11-06 Thread Albert Cahalan
Marcus Leech writes:

 I experimentally put some code just before the execvpe() in
 inject.py to close FDs = 3 and = 10.  I picked 10 out of
 the air, but I wouldn't expect there to be many open file
 descriptors at that point.  Actually, given the semantics of dup(),
 you could use it to probe what the maximum FD number is just before
 execvpe(), so the terminating condition could be something
 like = dup(0).

I don't see how dup() would help you. Remember, you could get
back fd 123 even if fd 12345 was the last one allocated and is
still in use. You get the lowest free fd.

You can do readdir() on /proc/self/fd to list them, being
careful to not close the fd used for reading the directory
until you have read the whole directory.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel