Re: [E-devel] Eo error message in my application

The Rasterman Mon, 05 Dec 2016 17:29:24 -0800

On Mon, 5 Dec 2016 12:04:26 +0900 Jean-Philippe André <j...@videolan.org> said:


> On 5 December 2016 at 11:04, Carsten Haitzler <ras...@rasterman.com> wrote:
> 
> > On Mon, 5 Dec 2016 09:53:51 +0900 Jean-Philippe André <j...@videolan.org>
> > said:
> >
> > > On 4 December 2016 at 11:27, Carsten Haitzler <ras...@rasterman.com>
> > wrote:
> > >
> > > > On Sat, 3 Dec 2016 20:25:46 -0200 Gustavo Sverzut Barbieri <
> > > > barbi...@gmail.com>
> > > > said:
> > > >
> > > > > On Sat, Dec 3, 2016 at 7:13 PM, Vincent Torri <
> > vincent.to...@gmail.com>
> > > > wrote:
> > > > > > On Sat, Dec 3, 2016 at 7:44 PM, Gustavo Sverzut Barbieri
> > > > > > <barbi...@gmail.com> wrote:
> > > > > >> On Sat, Dec 3, 2016 at 1:57 PM, Vincent Torri <
> > > > vincent.to...@gmail.com>
> > > > > >> wrote:
> > > > > >>> Hello
> > > > > >>>
> > > > > >>> I'm rewriting my multi-document viewer. I'm using EFL git,  and i
> > > > have
> > > > > >>> that Eo error message :
> > > > > >>>
> > > > > >>> ERR<59568>:eo[T:3] lib/eo/eo.c:1663 efl_isa() Object 40007f56 is
> > not
> > > > a
> > > > > >>> valid object in this context: object domain: 0, current domain:
> > 2,
> > > > > >>> local domain: 2, available domains: [  1 2  ]
> > > > > >>>
> > > > > >>> does someone have an idea of the problem ?
> > > > > >>
> > > > > >> the new efl in git provides more context to that message since
> > > > > >> yesterday, particularly if you use EO_LIFECYCLE_DEBUG=1 (envvar),
> > > > > >> since I was so pissed with such nebulous messages :-D
> > > > > >
> > > > > > all the debug stuff does not work on Windows (it needs libunwind,
> > or
> > > > > > at least DWARF parsing to get the debug symbols), but i will test
> > on
> > > > > > Linux when I have time.
> > > > > >
> > > > > > i've also seen some preload stuff you mentioned, which does not
> > work
> > > > > > on Windows (it's a different method to do similar stuff)
> > > > > >
> > > > > >> usually the object is already dead or was never created... If
> > you're
> > > > > >> using threads (seems so due "[T:3]"), be aware that objects are
> > bound
> > > > > >> to the thread they were created and you need to "import"  to use,
> > > > > >> Raster sent some email to the list and I recall som tests in the
> > > > > >> tree....
> > > > > >
> > > > > > I indeed use an ecore_thread
> > > > >
> > > > > yes, reading the message again, not just the T:3 for the thread, but
> > > > > also the "object domain: 0" (main thread) while "current domain: 2"
> > > > > (secondary thread) would lead to a failure since raster added the
> > > > > protection.
> > > >
> > > > see the message is NOT nebulous. its giving you details the object
> > belongs
> > > > to
> > > > thread domain 0 but your current domain is 2. thus thread does not
> > belong
> > > > to
> > > > the same thread domain that created the object. domain 1 is the shared
> > > > domain
> > > > so should you see object domain being 1 and it still isn't found that
> > means
> > > > there is no object there at all ion the shared table.
> > > >
> > > > the message is very explicit as to the core details of the issue. it's
> > an
> > > > invalid object as far as the thread accessing it is concerned.
> > > >
> > > > > If you want to use auto-locks, then get the shared domain with:
> > > > >
> > > > > efl_domain_current_push(EFL_ID_DOMAIN_SHARED);
> > > > > my_obj = efl_add(...);
> > > > > efl_domain_current_pop();
> > > >
> > > > this isn't really an option UNLESS you created the class entirely
> > inherited
> > > > from base class. i.e. the object class doesn't otherwise use any other
> > > > non-threadsafe data outside the object.
> > > >
> > > > > check eo_test_general.c and Eo.h, they can give you an idea of what
> > > > > can you do... basically all objects are bound to one thread (domain),
> > > > > main thread or shared. Their parent must be in the same domain. This
> > > > > allows TLS of eoid stuff, avoiding locks and possibly avoiding
> > > > > concurrency bugs. Just the shared domains have locks as it used to be
> > > > > in the whole eo.
> > > >
> > > > yup. and even creating shared objects comes with limitations. the
> > intent
> > > > is for
> > > > very specialized objects to become shared objects so multiple threads
> > can
> > > > chare
> > > > them. the general idea is you will have  very very very few shared
> > > > objects, so
> > > > then the global mutex surrounding all shared objects isn't really a big
> > > > issue,but sometimes sharing an object is far easier and simpler than
> > > > anything
> > > > else, and you are not that concerned about contention and performance.
> > most
> > > > objects will be thread-local (and most efl objects will be main-loop
> > local
> > > > of
> > > > course).
> > > >
> > > > but the error right now is very detailed and tells you exactly what it
> > > > sees is
> > > > probably going wrong in the lookup so you can figure out if its a
> > > > cross-thread
> > > > access problem OR that the object actually does not exist. be aware
> > that
> > > > mainloop domain is 0, and the default domain for ALL other threads is
> > 2.
> > > > that
> > > > means create obj in one non-mainloop thread and access in another
> > > > non-mainloop
> > > > thread ... both objects will have domain 1 BUT will have separate TLS
> > based
> > > > local object tables so the same object may haver the same domain but
> > not be
> > > > accessible across tables (there is a small chance you could get a
> > > > false-positive access, but given i now starts generation counts off at
> > > > random
> > > > values per thread, its highly unlikely to have the same object id AND
> > > > generation count match).
> > > > <https://lists.sourceforge.net/lists/listinfo/enlightenment-devel>
> > > >
> > >
> > >
> > > Nah I totally agree with Vincent that the error message is cryptic.
> >
> > the message before was just "this is an invalid object". try figure out why
> > from that message:
> >
> 
> Nope, before my patch there was a silent return :)

well the generic eoid access error was there. you just didn't use it. :)

> >    ERR("obj_id %p is not a valid object. Maybe it has been freed or does
> > not
> > belong to your thread?", (void *)obj_id);
> >
> > was the object before. well the specific one was an extra error you added
> > in
> > eo_isa. the generic obj pointer is invalid message is:
> >
> >    const char *type = "object";
> >    if (obj_id & ((Eo_Id)1 << (REF_TAG_SHIFT - 1))) type = "class";
> >    ERR("EOID %p is not a valid %s. "
> >        "EOID domain=%i, current_domain=%i, local_domain=%i. "
> >        "EOID generation=%lx, id=%lx, ref=%i, super=%i. "
> >        "Thread self=%lu. "
> >        "Available domains [%s %s %s %s]. "
> >        "Maybe it has been deleted or does not belong to your thread?",
> >
> 
> 
> 
> >
> >        (void *)obj_id,
> >        type,
> >        (int)domain,
> >        (int)data->domain_stack[data->stack_top],
> >        (int)data->local_domain,
> >        (unsigned long)(obj_id & MASK_GENERATIONS),
> >        (unsigned long)(obj_id >> SHIFT_ENTRY_ID) & (MAX_ENTRY_ID |
> > MAX_TABLE_ID | MAX_MID_TABLE_ID),
> >        (int)(obj_id >> REF_TAG_SHIFT) & 0x1,
> >        (int)(obj_id >> SUPER_TAG_SHIFT) & 0x1,
> >        (unsigned long)eina_thread_self(),
> >        (data->tables[0]) ? "0" : " ",
> >        (data->tables[1]) ? "1" : " ",
> >        (data->tables[2]) ? "2" : " ",
> >        (data->tables[3]) ? "3" : " "
> >       );
> >
> > maybe it should use the generic error print?
> >
> 
> It wasn't easily doable. Copy & paste (and trimming the unnecessary stuff)
> was easier and cleaner, because efl_isa doesn't resolve the object data.
> Also the generic error is very cryptic. What the hell is a domain?

read eo api and it has domains and are documented.

> > > Anyway as far as I remember shared objects are not a viable solution
> > until
> > > we change the calling mechanism (function & data resolution) to be
> > > thread-safe, rather than locking the entire pool of shared objects for
> > the
> > > entire duration of a call.
> >
> > locking the whole pool means we can do shared objects without a whole
> > tonne of
> > extra work. as long as all things accessed inside object methods (including
> > parent classes) are entirely within the code for that object or are already
> > threadsafe then no extra locks have to be written at all. everything will
> > just
> > work and it's a massive amount of work saved. having to fine-grain lock
> > everything manually is just insane and that's why efl is not threadsafe and
> > never will be because no one is going to go do that. this is why many libs
> > are
> > not threadsafe. it's a massive amount of work to do and easy to get wrong
> > to
> > miss an lock or unlock somewhere. having a single locking and unlocking
> > spot
> > even if it is a "one big fat lock for every shared object" is safer and far
> > less work. it is unlikely to ever be a real performance issue.
> 
> 
> It won't be a performance issue because we won't use shared objects.

i'm saying they will be used rarely, not commonly.

> But wait. It's used right now by the filters. Oh that means a filter locks
> ALL the other shared objects while processing. But that's probably fine
> because the filters are insanely fast. Evas VG also uses shared objects. No
> problem either because those computations are also blazing fast.
> ecore_thread_future_run also uses a shared object (with sync progress cb).

both of these are quick-and-dirty solutions to the fact that the code was
before violating "accessing objects across threads". it was a quick and dirty
solution to what was already bad code. just doing a scope data get from a
thread involves rummaging around the eoid table which can be changing while you
poke around and thus requires locks. moving eoid tables to TLS to solve the
need for locks more efficiently broke code that already was going to crash
sometimes on object access. it was already broken code. it's just broken
differently now.

> I strongly disagree with you. The calling mechanism needs to be safe, not
> the entire pool. If the calling mechanism is safe, then an object can't die
> under your feet as the internal call ref can't be messed with from outside
> eo. As for accessing internal data we're only talking about standard
> multi-threaded programming.

but then you HAVE to add mutexes to all eo objects or have special shared vs
non-shared eo base classes to be able to do any threadsafe objects at all
because eo_base + core eo calls like efl_ref() etc. must then be threadsafe.
not just the calling mechanism. a mutex is rather large. it's 40 bytes just for
a single mutex. you'd have to use a recursive mutex anyway becaus eyou CAN'T
control the code inside a function as to when it calls a method on itself or
another object and then that comes back to call on the original, so a recursive
mutex would be a basic requirement no matter what PER object to ever be able to
implement thread safe objects.

> Having a purely thread-safe object would require a thread safe base class
> as well as an entire thread-aware class hierarchy. That's more work, but at
> least it wouldn't block ALL the objects that need a bit of thread safety.
> We most likely don't have the need for such objects right now.

it's a hell of a lot more work and who is going to do it? reality is the
objects now using shareable objects shbouldn't be doing so. it's a quick and
dirty hack around the fact that they were already violating "accessing an
object across threads" as above. the correct solution is to 

> While there may be a niche need for globally shared objects with a global
> lock, I can't see one at the moment. Maybe what we need is another domain.

there are 4 and we have used 3 of them so far. there is a spare. but unless
someone is going to:

1. implement a separate threadsaafe libeo.so (eo_ref, eo_unref, etc. as wlel as
all of base class).
2. actually ensure they implement all the locking and unlocking on entry and
exit correctly in every method for all of eo as well as eo base and all classes
that inherit from this... fun times.
3. you won't be ably to re-use existing interfaces so you'll need
efl_threadsafe_file and all those classes re-implemented in a threadsafe way
and with a separate threadsafe naming scheme in c...

shareable objects can use all existing classes. yes there is a catch if they
return "const" internal data (like const char * or a const Eina_List * etc.)
that is internal data that could be modified by another thread while being used
by the caller. yes you can create deadlocks, but you can create deadlocks in
any threaded system eventually.

you COULD solve the blocking calls by changing the classes for ector and
filters to not have blocking calls when they do work. instead they pipeline it
as a command queue and call callbacks async when done if you want to know.
reality is that evas_render has been moving this way because it frankly has to
be done for anything that is "slow" like rendering. anything that may block.

and even then they likely should have a slave worker threads behind the objects
doing the actual work with the objects themselves living in a thread on their
own.

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    ras...@rasterman.com


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] Eo error message in my application

Reply via email to