On Wed, Aug 10, 2011 at 11:47 PM, Gustavo Sverzut Barbieri <[email protected]> wrote: > On Tue, Aug 9, 2011 at 5:33 PM, Cedric BAIL <[email protected]> wrote: >> On Tue, Aug 9, 2011 at 2:22 PM, Gustavo Sverzut Barbieri >> <[email protected]> wrote: >> > Hi all, particularly cedric. I was to do the bindings for the new >> > ethumb_client_exists() API and got impressed by the number of >> > problems. >> > >> > - inconsistent api: void* is the first callback parameter, see >> > all EFL >> >> Agreed, but I followed previous callback format of Ethumb_Client, see >> Ethumb_Client_Generate_Cb. I have no problem about breaking this and >> follow the rest of the EFL API convention. In fact, I would really >> like that solution as I think Ethumb API is a mess right now. > > typedef void (*Ethumb_Client_Generate_Cb)(void *data, > Ethumb_Client *client, int id, const char *file, const char *key, > const char *thumb_path, const char *thumb_key, Eina_Bool success); > > as I said, void*data is the first... > > What of Ethumb API is a mess now? Could you clarify? And are you > talking about Ethumb or Ethumb_Client?
Only speaking about Ethumb_Client, the rest I don't really care. I typically don't like the generate cancel function, that need a new cancel_cb and a free_cb. I never saw any use of that stuff and it look like it over complexify things for nothing. >> > - inconsistent api: missing ethumb_CLIENT prefix, like >> > Ethumb_Exists should be Ethumb_Client_Exists >> >> The data type, right ? > > yes. > >> > - strange api: if I have an Ethumb_Exists handle why do I need >> > all the other parameters? Because Ethumb_Exists is an internal >> > thing exposed to user! You should return the >> > Ethumb_Async_Exists_Cb structure (and remove Cb from it) >> >> Maybe I should reverse the logic, have all Cb pointing to the same >> Ethumb_Async_Exists instance, that would solve the cancel parameter >> things. > > keep them split, listen to me :-) > >> > - memory leak: in _ethumb_client_exists_end() you free the >> > async->callbacks but nothing frees 'cb' nodes. >> >> Outch and not only there, but also in cancel. > > I see nothing wrong with cancel, actually it should call _end as well. Was wondering, but yes, it should. >> > - strange behavior: you can create lots of async requests, but >> > if you cancel one, all of them is cancelled! >> >> Nop, only the callback is cancelled. It will wait to have no more >> callback before cancelling the job. > > yeah, I failed to see it's just when the async->ref drops to zero... > damn I hate those refcount macros... > >> > - race condition: thread may start and finish before you add the >> > structure to the hash: >> > async->thread = ecore_thread_run(...); >> > eina_hash_direct_add(_exists_request, async->dup, async); >> >> Yup, should check ecore_thread_run returned value. > > I don't recall ecore_thread_run() behavior but I don't see how it > would help. It could have gathered the structure to return and right > before the function does the return the process can be preempted to > run the thread, that may finish it's stuff before being preempted. At > the end you return a dead thread. Like this: > thread-1: ecore_thread_run(), allocates return, start thread > thread-2: run and finishes > thread-1: return a finished thread Nop, never happen. The finish is done in the main loop when ecore get to run again, so after the function. No race condition here. That's the purpose of Ecore_Thread, you don't have to worry about that kind of race condition. >> > - strange behavior: provided cb is executed even when it was >> > cancelled >> >> Yes, it was done that way as it's the only way to be sure that the >> data are not used any more. But right now, if I reverse the loginc >> with callback and async stuff, it should not be needed anymore as >> callback will not be in the list anymore. >> >> > - bad API design: there is no free_cb for exists. If you cancel >> > it you still get the main cb but this is not clear. This is >> > bad for bindings. >> >> Agreed and unecessary now in fact, as the callback is removed from the >> list, there is no chance to be called later. > > please add a free callback. It makes bindings (or anyone doing proper > code) a mess to delete references to it and cleanup. Ok. >> > In my opinion there should be no single-thread per ethumb_client >> > thing. It's a micro optimization and is useless since most users will >> > ask just one. >> >> I don't think so, fire elementary_test ethumb test and you will see >> the difference directly. Grouping command are usefull when used by >> gengrid as you can be very quickly asked to create one >> thumbnail,destroy and asked again, or some times asked twice and >> destroyed once. It was possible to see the difference in excessive >> also. > > This is a problem with elementary's usage then. Although I really > really fail to see this happening a lot in common cases. You rarely > will have a thumb view with the same icon as we have in > elementary_test, that's just nonsense... we happen to have it as we > have few images to test, it's far from realistic. Ok, so try it in enki and excessive. That was my benchmark during that development, and it definitively help. Because during asynchronous build of the grid, stuff move up and down. Making request come and go, cancel and uncancelling during their life time. Computing MD5, calling stat and setting extended attribute do take enough time to give a chance to another request to uncancel the previous one. > Seriously, make it right first, if people come to provide a real use > case I can take a look at it myself. > > All in all if that turns to be a common case I'd rather cache the > paths in elm_ethumb than to fix this the "optimization" we have now. So putting the burden on the user of ethumb. I don't like duplicating code. > It is similar to my complains of the "exists" being async, god damn, > we're computing an MD5 of a path... this shouldn't be that much worse > than computing the eina_hash to go into a thread! Just check how other > parts of this process account and this part should be very minimal to > deserve such handling... look how the code grown, and which problems > went in for little benefit. If you think it is worth because it > stat()s the target file, then you can just wrap some eio call and > avoid all the mess around ethumb. No, the MD5 should go in a thread what ever you said. I did real benchmark, and it was together with the stat blocking the UI in excessive and enki. I first benchmarked enki/excessive to understand where the laggy did come from and it was coming from this MD5 + stat. That's why now it tries to load the MD5 from the extented attribute first (faster than computing MD5 on my laptop maybe even better with a SSD), if it didn't find it, it compute MD5, then run stat and store back MD5 if necessary. This helped a lot in making Ethumb usable by Enki and Excessive. And no, computing hash is way faster than computing MD5. Checking if a request has already been started, reduce race condition to almost none and help avoid doing to times the same thing. So that's a double win, worth it. >> Right now, the main problem of Ethumb are it's internal, to >> complex and really messy. Exposing Id is in my opinion a bad idea and >> the API is clearly a mess. This make it really difficult to fix the >> last issue I am facing with it, being to process one thumb per >> available CPU. > > Again, what in ethumb? now looks like the server/daemon. What you mean > with "exposing ld", what "ld"? What is messy? I really dislike the fact that we return internal Id directly to ethumb user. That's internal stuff, and it complexify attaching value to a pending request. > I really think that the ethumbd slave should be a single process as I > don't want it trashing even more a system. Let's see, if you're > getting lots of thumbnails requests: > 1- likely you're doing lots of work on your application (highest > interactivity priority) > 2- you'll be sending lots of dbus commands, that will consume work on its > own > 3- you'll wake the dbus-daemon to dispatch the commands > 4- you'll wake the ethumbd to queue requests > 5- you'll wake the slave to process the requests > > that's a lot of processes already, I guess adding more to fight for > CPU will make things worse. Okay, you may end the generation faster, > but that was never the point. The point of having async/daemon is > really to keep the main application responsive while things are being > generated on the background. Yes, that's the idea, but still using the hardware we have is also a good idea. I have 7 cores that are sitting idling when I open a directory full of picture, waiting for picture to come back. On embedded target, you can have some SSD with dual core and a dedicated JPEG decompressor, the current design of ethumb prevent the use of this. > Anyway, imposing a limitation is not something I want. There are > people with 16 cores out there and then it could be useful. In this > case I can guide you to get there. It should be simple: > 1- on elementary use various clients (a poll of clients), maybe add > an API to check how many ethumbd slaves are enabled (if this is > configured somewhere) > 2- on ethumbd creates various slaves and round-robin clients/requests to > them So you advertise to not use the dbus API, but directly call ethumbd slave from each application ? I don't like the idea of multiple slave. Evas already can take care of doing the load of image asynchronously and automatically taking advantage of any hardware available. Splitting it to many process wouldn't help and would make ethumb aware of more thing than it should. I really prefer that we go with one ethumbd slave and use asynchronous loading of image. The main advantage of dbus, is that it take of waking up one daemon and only one. If marshaling command take time, then we should add a way to send order directly to daemon without the use of dbus. We have all the infra to do it properly and fast, Eet and Ecore_Con to a specific local ipc socket. I would rather prefer that solution, than having multile process around fighting to access the CPU without any possible cooperation. > quite simple and direct. Possibly what you're trying to do is to mess > it all inside one single ethumb_client request and that is bad. Each > client is meant to be sequential fetching, although asynchronous. Just > look the amount of work to have this work... if you add simultaneous > to it, then it will get even worse, performance and usability (API) > wise. I don't see why it should be more complex than Eio. You ask a specific file to be thumbnailed, with one callback for when it's ready, another when it fail. You then go back to your work and when the daemon is done with it's business it call you back, tell you where the file is and that's it. >> Anyway thanks for your review, Ethumb need some love and I almost >> stoped my work in the middle as it worked for me... > > IMO just remove ethumb calls from threads, if you want to stat() from > a thread, then fine but shouldn't matter much. Fix the API as I said. > Leave multiple slaves out. It really matter, if you tested enki and excessive before that change, they were impossible to use at all. That's why I do cache MD5 result and do every thing in a thread, because it was hurting use with thousand of picture. > Take some time to understand how ethumb/ethumbd/ethumb_client works. > It is meant to be serial on each level and even so, due the async > nature, it is already quite complex. > Ideally it would be like: > > handle = ethumb_generate(file, parameters, end_cb) > > But C does not make it easy... 'parameters' there would be a mess by > itself. Either it would be a function with lots of parameters or a > struct you'd have to manage... If a structure you'd have to avoid > recreating/refilling it everytime, and then you'd face exactly the > same problems we do now and what you call it "messy". > > The current API tries to be convenient and easy to use, yet efficient. > It will cache things, make similar requests easy, etc. Just > understand you better use it in a serial way. I don't see an issue with serialisation. But the thing is that it should really be like evas, all operation should be just changing internal state not doing any computation, only generate should have been doing one. I see no problem with something like : params_set(ctx, params1); handle = ethumb_client_thumb_path_get(file1, done_cb, failed_cb, data); params_set(ctx, params2); handle = ethumb_client_thumb_path_get(file2, done_cb, failed_cb, data); Having to deal in the client api with both exist and generate is in my opinion really to low level. At the end, all application that use Ethumb in svn, start with a exists, then call generate. Some app do dupplicate their logic in both callback, because once you have the file, you really want to do the same. You don't care if it was just generated or if it already exists. You want something that never block your UI, is fast, light and simplify your retrieval of thumbnail. The rest, I really thing the developper that use ethumb don't want to care about. Maybe that's not the point of ethumb, and I should add all this logic in elm_thumb. And change all apps that do use ethumb directly to use elementary instead, but I think that we could really simplify the use of Ethumb without requiring another layer to do that. -- Cedric BAIL ------------------------------------------------------------------------------ Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-dev2dev _______________________________________________ enlightenment-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/enlightenment-devel
