Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: On Tue, 2008-09-16 at 18:35 +0100, Martyn Russell wrote: yes its much better tracker-search works on command line but tracker-search-tool still kills the daemon Which API call was it that kills the daemon again? The GetHitCountAll? Im happy for you to merge once that last issue is resolved I just ran the TST in valgrind and it was fine with a few leaks of course. if you cant replicate it then we can as you say sort it out at the hackfest I have to leave in a few hours so perhaps we can just do that. thanks for all your (and your teams) hard work. Im very happy with indexer-split and it now runs like a charm Our pleasure. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: On Fri, 2008-09-12 at 18:35 +0100, Martyn Russell wrote: Jamie McCracken wrote: Note that existing config options must be respected as otherwise upgrading will be impossible for existing users This option is not honoured then. I do it one of 2 ways right now. Either: 1. trackerd -d evolution 2. DisabledModules=evolution; The IndexEvolutionEmails option must have been overlooked. Can I ask, have you tried removing your config file too? no because it needs to work with existing settings I have fixed this issue now so legacy options like IndexEvolutionEmails now works. I will look into working on an upgrade path to fix this on Monday. ok - I will be travelling monday to UK - I will try and check again on tuesday OK. We have fixed a whole host of issues since in the last 2 days. One of which was the huge slow down in indexing speed. Other efficiencies have been made too. Which file are you checking for indexed words? all of them - i just my name jamie to search as it should match all files as they are in path /home/jamie I installed packages on the N810 device just tonight and this all works as it should: Checking for extensions by partial name matching: = $ tracker-search -s Files pdf Results: /home/user/MyDocs/.documents/osso_software_copyright.pdf /home/user/MyDocs/.documents/~sfil_li_folder_user_guides/User_guide_English_US.pdf /home/user/MyDocs/.documents/~sfil_li_folder_user_guides/Gebruikershandleiding_Nederlands.pdf /home/user/MyDocs/.documents/~sfil_li_folder_user_guides/User_guide_English_GB.pdf /home/user/MyDocs/.documents/~sfil_li_folder_user_guides/Brukermanual_Norsk.pdf /home/user/MyDocs/.documents/~sfil_li_folder_user_guides/User_guide_Arabic.pdf /home/user/MyDocs/.documents/~sfil_li_folder_user_guides/Brugervejledning_Dansk.pdf /home/user/MyDocs/.documents/~sfil_li_folder_user_guides/Bedienungsanleitung_Deutsch.pdf /home/user/MyDocs/.documents/~sfil_li_folder_user_guides/Manuale_d'uso_Italiano.pdf Searching for all folders: == $ tracker-files -s Folders Results: /home/user/MyDocs/.documents/~sfil_li_folder_user_guides Searching for all music: $ tracker-files -s Music Results: /home/user/MyDocs/.sounds/Moby-In_My_Heart.mp3 We can see if we can help you in Berlin. See you there! :) -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: On Wed, 2008-09-10 at 23:20 -0400, Jamie McCracken wrote: Im afraid Im unable to run latest svn trackerd dies everytime I try and search for somehting I did following: svn up make distclean make sudo make install sudo rm -rf /usr/bin/trackerd sudo rm -rf /usr/bin/tracker-indexer rm -rf ~/.cache/tracker rm-rf ~/.local/share/tracker trackerd -v 3 searching with tracker-search-tool crashes trackerd searching with tracker-search returns no result (when it should) I don't see this crash. Note: in the last 24 hrs, Mikael has fixed a couple of nasty issues which could improve your situation. Namely metadata date handling issues and MP3 JPEG extractor fixes. I also changed the extractor to use libtracker-common functions instead of the duplicated code it was using. trackerd output showing it continuously outputs : Tracker-Message: Indexed 105/425, module:'evolution', 07m 13s left, 02m 22s elapsed Note: It will spit status messages out roughly every 10 seconds to keep the daemon and applet up to date. Even if nothing has happened. note I have evol email indexing disabled How have you disabled that? I tried it with -d evolution and with the DisabledModules config option in the .cfg file. Both worked fine for me. can you verify it runs correctly with reindex when evo email indeixng is set to false? Yes, I have done that twice. I have no problems with the indexing or searching with the search tools either. I have tried this on my desktop and I have done this on the Nokia device too. Both work properly. Are you able to try on another machine? for me it appears the index never flushes as it constantly tries to index evo stuff but the indexer rejects it Hmm, if you run the daemon with -v 3 it should say when it gets to the evolution module if it is disabled or not. You don't have the evolution mail directory in your WatchDirectoryRoots do you? also can you confirm if tracker-search-tool crashes trackerd when you supply a search term that does not exist in the index? Test that and with a word similar to another it suggests something (which shows results when I click on it) and with something completely incomprehensible it just says it couldn't find anything. I don't get any crashes. I have asked Phillip to do the same too to make sure it isn't something I am doing. I have a feeling it might be the content you are indexing. Does it crash for you if you index a small selection of files? If you could try a couple of places with only a few files in a directory that would really help us identify if it was content based or not. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Fri, 2008-09-12 at 10:27 +0100, Martyn Russell wrote: Jamie McCracken wrote: On Wed, 2008-09-10 at 23:20 -0400, Jamie McCracken wrote: Im afraid Im unable to run latest svn trackerd dies everytime I try and search for somehting I did following: svn up make distclean make sudo make install sudo rm -rf /usr/bin/trackerd sudo rm -rf /usr/bin/tracker-indexer rm -rf ~/.cache/tracker rm-rf ~/.local/share/tracker trackerd -v 3 searching with tracker-search-tool crashes trackerd searching with tracker-search returns no result (when it should) I don't see this crash. Note: in the last 24 hrs, Mikael has fixed a couple of nasty issues which could improve your situation. Namely metadata date handling issues and MP3 JPEG extractor fixes. I also changed the extractor to use libtracker-common functions instead of the duplicated code it was using. trackerd output showing it continuously outputs : Tracker-Message: Indexed 105/425, module:'evolution', 07m 13s left, 02m 22s elapsed Note: It will spit status messages out roughly every 10 seconds to keep the daemon and applet up to date. Even if nothing has happened. I know but it just outputs the above continuously with no change in the indexed count and it does not flush to index which means I cant search for anything note I have evol email indexing disabled How have you disabled that? tracker.cfg file in~/ .config/tracker- I set IndexEvolutionEmails=false Note that existing config options must be respected as otherwise upgrading will be impossible for existing users I tried it with -d evolution and with the DisabledModules config option in the .cfg file. Both worked fine for me. can you verify it runs correctly with reindex when evo email indeixng is set to false? Yes, I have done that twice. I have no problems with the indexing or searching with the search tools either. I have tried this on my desktop and I have done this on the Nokia device too. Both work properly. Are you able to try on another machine? nope for me it appears the index never flushes as it constantly tries to index evo stuff but the indexer rejects it Hmm, if you run the daemon with -v 3 it should say when it gets to the evolution module if it is disabled or not. it does not say anything about that You don't have the evolution mail directory in your WatchDirectoryRoots do you? no also can you confirm if tracker-search-tool crashes trackerd when you supply a search term that does not exist in the index? Test that and with a word similar to another it suggests something (which shows results when I click on it) and with something completely incomprehensible it just says it couldn't find anything. I don't get any crashes. I have asked Phillip to do the same too to make sure it isn't something I am doing. I have a feeling it might be the content you are indexing. Does it crash for you if you index a small selection of files? If you could try a couple of places with only a few files in a directory that would really help us identify if it was content based or not. will play some more but index file size indicate sits empty so no flushing has occurred jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: On Fri, 2008-09-12 at 10:27 +0100, Martyn Russell wrote: I know but it just outputs the above continuously with no change in the indexed count and it does not flush to index which means I cant search for anything Flushing doesn't happen when the status message is printed. It is done once a minute as I recall. Also, you can't use the index until the index is closed by the indexer (which is usually when it finishes OR I *think* when a request comes in to the daemon). note I have evol email indexing disabled How have you disabled that? tracker.cfg file in~/ .config/tracker- I set IndexEvolutionEmails=false Note that existing config options must be respected as otherwise upgrading will be impossible for existing users This option is not honoured then. I do it one of 2 ways right now. Either: 1. trackerd -d evolution 2. DisabledModules=evolution; The IndexEvolutionEmails option must have been overlooked. Can I ask, have you tried removing your config file too? I will look into working on an upgrade path to fix this on Monday. Are you able to try on another machine? nope Well, I have tried in 2 different locations and Phillip has tried to reproduce your issues too. It works for us :/ also can you confirm if tracker-search-tool crashes trackerd when you supply a search term that does not exist in the index? Test that and with a word similar to another it suggests something (which shows results when I click on it) and with something completely incomprehensible it just says it couldn't find anything. I don't get any crashes. I have asked Phillip to do the same too to make sure it isn't something I am doing. I have a feeling it might be the content you are indexing. Does it crash for you if you index a small selection of files? If you could try a couple of places with only a few files in a directory that would really help us identify if it was content based or not. will play some more but index file size indicate sits empty so no flushing has occurred Email is a special case, it isn't a 1 file = 1 index increment because there are parts of emails which can be considered unique units to index. Other than that I have noticed this and we can improve on it. Which file are you checking for indexed words? -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Fri, 2008-09-12 at 18:35 +0100, Martyn Russell wrote: Jamie McCracken wrote: On Fri, 2008-09-12 at 10:27 +0100, Martyn Russell wrote: I know but it just outputs the above continuously with no change in the indexed count and it does not flush to index which means I cant search for anything Flushing doesn't happen when the status message is printed. It is done once a minute as I recall. Also, you can't use the index until the index is closed by the indexer (which is usually when it finishes OR I *think* when a request comes in to the daemon). note I have evol email indexing disabled How have you disabled that? tracker.cfg file in~/ .config/tracker- I set IndexEvolutionEmails=false Note that existing config options must be respected as otherwise upgrading will be impossible for existing users This option is not honoured then. I do it one of 2 ways right now. Either: 1. trackerd -d evolution 2. DisabledModules=evolution; The IndexEvolutionEmails option must have been overlooked. Can I ask, have you tried removing your config file too? no because it needs to work with existing settings I will look into working on an upgrade path to fix this on Monday. ok - I will be travelling monday to UK - I will try and check again on tuesday Are you able to try on another machine? nope Well, I have tried in 2 different locations and Phillip has tried to reproduce your issues too. It works for us :/ also can you confirm if tracker-search-tool crashes trackerd when you supply a search term that does not exist in the index? Test that and with a word similar to another it suggests something (which shows results when I click on it) and with something completely incomprehensible it just says it couldn't find anything. I don't get any crashes. I have asked Phillip to do the same too to make sure it isn't something I am doing. I have a feeling it might be the content you are indexing. Does it crash for you if you index a small selection of files? If you could try a couple of places with only a few files in a directory that would really help us identify if it was content based or not. will play some more but index file size indicate sits empty so no flushing has occurred Email is a special case, it isn't a 1 file = 1 index increment because there are parts of emails which can be considered unique units to index. Other than that I have noticed this and we can improve on it. Which file are you checking for indexed words? all of them - i just my name jamie to search as it should match all files as they are in path /home/jamie jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: Im afraid Im unable to run latest svn trackerd dies everytime I try and search for somehting I did following: svn up make distclean I would use: make maintainer-clean make sudo make install Make sure you are not running trackerd or tracker-indexer first. Before doing any installing or even the make maintainer-clean I would: sudo make uninstall sudo rm -Rf /usr/bin/tracker* sudo rm -Rf /usr/libexec/tracker* sudo rm -Rf /usr/lib/tracker/ sudo rm -Rf /usr/share/tracker/ rm -Rf ~/.cache/tracker rm -Rf ~/.local/share/tracker rm -Rf ~/.config/tracker I would also do: find /usr -name '*tracker*' And make sure EVERYTHING is removed first. Then when you autogen again, use: CFLAGS=-g -O0 ./autogen.sh --prefix=/usr --localstatedir=/var --sysconfdir=/etc (the CFLAGS for in case you need to valgrind or gdb it) BUT even better, completely re-check out the the branch to be sure. sudo rm -rf /usr/bin/trackerd sudo rm -rf /usr/bin/tracker-indexer There is also the thumbnailer which is no longer installed in /usr/bin. rm -rf ~/.cache/tracker rm-rf ~/.local/share/tracker As above, don't forget the config. trackerd -v 3 I would make sure you run a specific version instead of just letting the path find trackerd, i.e. /usr/libexec/trackerd. searching with tracker-search-tool crashes trackerd searching with tracker-search returns no result (when it should) I have freshly installed the packages on the Maemo device this week and the tracker-search works fine (of course the TST doesn't). trackerd output showing it continuously outputs : Tracker-Message: Indexed 105/425, module:'evolution', 07m 13s left, 02m 22s elapsed note I have evol email indexing disabled I will check this. I know it happens, I just need to fix it. I found indexing today to be incredibly slow, so I was just investigating that. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Martyn Russell wrote: Jamie McCracken wrote: searching with tracker-search-tool crashes trackerd searching with tracker-search returns no result (when it should) I have freshly installed the packages on the Maemo device this week and the tracker-search works fine (of course the TST doesn't). And it works - I meant to say. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Im afraid Im unable to run latest svn trackerd dies everytime I try and search for somehting I did following: svn up make distclean make sudo make install sudo rm -rf /usr/bin/trackerd sudo rm -rf /usr/bin/tracker-indexer rm -rf ~/.cache/tracker rm-rf ~/.local/share/tracker trackerd -v 3 searching with tracker-search-tool crashes trackerd searching with tracker-search returns no result (when it should) trackerd output showing it continuously outputs : Tracker-Message: Indexed 105/425, module:'evolution', 07m 13s left, 02m 22s elapsed note I have evol email indexing disabled jamie Jamie McCracken wrote: On Tue, 2008-09-09 at 18:02 +0200, Philip Van Hoof wrote: On Tue, 2008-09-09 at 11:56 -0400, Jamie McCracken wrote: On Tue, 2008-09-09 at 16:30 +0100, Martyn Russell wrote: Jamie can we try to get this merge done this week please sure - just need to check your additions which I will try to do tonight obviously post merge you will have to submit major patches to me so best to get as much in as possible before that It might make sense for us to continue bleeding edge development in the branch and after (your) intensive review merge that diff to trunk. Not sure how others feel about this? I dont mind if you want to sync what you have so far with trunk and then continue on indexer-split What you have so far is not ready for a release so it does not make much difference whether we merge or not at this point. Obviously if martyn and co feel strongly about merging then I will oblige. I have full confidence in you guys and am happy to prove it :) Phillip is right. From day to day when we are all working (and not on vacation like now) we tend to range on average from 5 to 20 commits a day. That's quite a lot of review work for you. There are 5 of us working on Tracker right now and it has been that way for a number of months. Unless you want more email of course :P Any thing major would be discussed with you first before it was implemented and potentially done in a separate branch anyway. The only thing I would say is that creating a diff for TRUNK is probably not a good idea. it is probably best to just copy everything right over after doing a pre-merge tag. There are a LOT of file differences and created/deleted files too. If you want me to do the merge, I can, just let me know. ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-09-10 at 23:20 -0400, Jamie McCracken wrote: Im afraid Im unable to run latest svn trackerd dies everytime I try and search for somehting I did following: svn up make distclean make sudo make install sudo rm -rf /usr/bin/trackerd sudo rm -rf /usr/bin/tracker-indexer rm -rf ~/.cache/tracker rm-rf ~/.local/share/tracker trackerd -v 3 searching with tracker-search-tool crashes trackerd searching with tracker-search returns no result (when it should) trackerd output showing it continuously outputs : Tracker-Message: Indexed 105/425, module:'evolution', 07m 13s left, 02m 22s elapsed note I have evol email indexing disabled jamie can you verify it runs correctly with reindex when evo email indeixng is set to false? for me it appears the index never flushes as it constantly tries to index evo stuff but the indexer rejects it also can you confirm if tracker-search-tool crashes trackerd when you supply a search term that does not exist in the index? jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Martyn Russell wrote: Martyn Russell wrote: Hi, So I have been reading up on the things that are remaining for merging. This is the list I have so far which I will be working on: * Check the move files/directories issue. I *think* it works. Work on this will be continuing Monday. This should be fixed now. The to and from strings were simply the wrong way round. The code has also been improved here to use the file event queue too which means we can make full use of the state machine also. * Make private libraries .so files to dynamically load them. I have made libtracker-common, libtracker-db and libstemmer all .so files. But I think what you meant was to make each language a .so which we dlopen() using something like GModule, right? That we can possibly do next week. I don't think this should stop the merge to be honest. There are bigger problems to address first. I will start on this tomorrow. But we don't need it for the merge. * The directory mtime issue on startup. This is fixed. Have I missed anything? New items: * Check mtime for summary files too. I have yet to do this. I spoke briefly to Carlos about it. His opinion was that it isn't necessary. I agree that it certainly isn't necessary for the merge. I can look into this some time after the .so issue. Jamie can we try to get this merge done this week please? -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Tue, 2008-09-09 at 16:30 +0100, Martyn Russell wrote: Martyn Russell wrote: Martyn Russell wrote: Hi, So I have been reading up on the things that are remaining for merging. This is the list I have so far which I will be working on: * Check the move files/directories issue. I *think* it works. Work on this will be continuing Monday. This should be fixed now. The to and from strings were simply the wrong way round. The code has also been improved here to use the file event queue too which means we can make full use of the state machine also. * Make private libraries .so files to dynamically load them. I have made libtracker-common, libtracker-db and libstemmer all .so files. But I think what you meant was to make each language a .so which we dlopen() using something like GModule, right? That we can possibly do next week. I don't think this should stop the merge to be honest. There are bigger problems to address first. I will start on this tomorrow. But we don't need it for the merge. * The directory mtime issue on startup. This is fixed. Have I missed anything? New items: * Check mtime for summary files too. I have yet to do this. I spoke briefly to Carlos about it. His opinion was that it isn't necessary. I agree that it certainly isn't necessary for the merge. I can look into this some time after the .so issue. Jamie can we try to get this merge done this week please sure - just need to check your additions which I will try to do tonight obviously post merge you will have to submit major patches to me so best to get as much in as possible before that jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Tue, 2008-09-09 at 11:56 -0400, Jamie McCracken wrote: On Tue, 2008-09-09 at 16:30 +0100, Martyn Russell wrote: Jamie can we try to get this merge done this week please sure - just need to check your additions which I will try to do tonight obviously post merge you will have to submit major patches to me so best to get as much in as possible before that It might make sense for us to continue bleeding edge development in the branch and after (your) intensive review merge that diff to trunk. Not sure how others feel about this? The thing is that our team is changing things quite fast. For our team we need a shared repository anyway. So either it would be a branch, or we'd have our own private team repository (git-like, or indeed a git one - which might make a lot of sense -). Obviously we prefer to do things at the upstream project asap. Private repositories are not cool in at least my opinion. But having to wait for individual approvals of each and every patch ... would also block our methodology a little bit. Anyway ... my proposal for post-merge is to discuss this together at the Maemo Desktop Search hackfest in Berlin. -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Tue, 2008-09-09 at 18:02 +0200, Philip Van Hoof wrote: On Tue, 2008-09-09 at 11:56 -0400, Jamie McCracken wrote: On Tue, 2008-09-09 at 16:30 +0100, Martyn Russell wrote: Jamie can we try to get this merge done this week please sure - just need to check your additions which I will try to do tonight obviously post merge you will have to submit major patches to me so best to get as much in as possible before that It might make sense for us to continue bleeding edge development in the branch and after (your) intensive review merge that diff to trunk. Not sure how others feel about this? I dont mind if you want to sync what you have so far with trunk and then continue on indexer-split What you have so far is not ready for a release so it does not make much difference whether we merge or not at this point. Obviously if martyn and co feel strongly about merging then I will oblige. I have full confidence in you guys and am happy to prove it :) jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: On Tue, 2008-09-09 at 18:02 +0200, Philip Van Hoof wrote: On Tue, 2008-09-09 at 11:56 -0400, Jamie McCracken wrote: On Tue, 2008-09-09 at 16:30 +0100, Martyn Russell wrote: Jamie can we try to get this merge done this week please sure - just need to check your additions which I will try to do tonight obviously post merge you will have to submit major patches to me so best to get as much in as possible before that It might make sense for us to continue bleeding edge development in the branch and after (your) intensive review merge that diff to trunk. Not sure how others feel about this? I dont mind if you want to sync what you have so far with trunk and then continue on indexer-split What you have so far is not ready for a release so it does not make much difference whether we merge or not at this point. Obviously if martyn and co feel strongly about merging then I will oblige. I have full confidence in you guys and am happy to prove it :) Phillip is right. From day to day when we are all working (and not on vacation like now) we tend to range on average from 5 to 20 commits a day. That's quite a lot of review work for you. There are 5 of us working on Tracker right now and it has been that way for a number of months. Unless you want more email of course :P Any thing major would be discussed with you first before it was implemented and potentially done in a separate branch anyway. The only thing I would say is that creating a diff for TRUNK is probably not a good idea. it is probably best to just copy everything right over after doing a pre-merge tag. There are a LOT of file differences and created/deleted files too. If you want me to do the merge, I can, just let me know. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Martyn Russell wrote: Hi, So I have been reading up on the things that are remaining for merging. This is the list I have so far which I will be working on: * Check the move files/directories issue. I *think* it works. Work on this will be continuing Monday. * Fix the get_file_contents() function so it checks for #13 in the first 64Kb. This should be fixed now. I actually took a different approach on this. The code has a #define at the top to switch the behaviour here between * Validating up to the most valid UTF-8 character (default). * Checking for valid UTF-8, if not, trying to convert from locale and if unsuccessful dropping the file (behaviour in TRUNK). * Make private libraries .so files to dynamically load them. I have made libtracker-common, libtracker-db and libstemmer all .so files. But I think what you meant was to make each language a .so which we dlopen() using something like GModule, right? That we can possibly do next week. I don't think this should stop the merge to be honest. There are bigger problems to address first. * The directory mtime issue on startup. Work on this will be continuing on Monday. Have I missed anything? New items: * Check mtime for summary files too. Im also adding my tracker-fts stuff into that branch so will likely merge when above + my stuff is ready Yea. Can you make sure your code compiles without warnings before committing in the future if possible :) With the new code additions, make distcheck fails and building packages is impossible. For now, I made your code optional (disabled by default). To compile it, you can use --enable-sqlite-fts with configure. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Fri, 2008-09-05 at 13:19 +0100, Martyn Russell wrote: Martyn Russell wrote: Hi, So I have been reading up on the things that are remaining for merging. This is the list I have so far which I will be working on: * Check the move files/directories issue. I *think* it works. Work on this will be continuing Monday. * Fix the get_file_contents() function so it checks for #13 in the first 64Kb. This should be fixed now. I actually took a different approach on this. The code has a #define at the top to switch the behaviour here between * Validating up to the most valid UTF-8 character (default). * Checking for valid UTF-8, if not, trying to convert from locale and if unsuccessful dropping the file (behaviour in TRUNK). * Make private libraries .so files to dynamically load them. I have made libtracker-common, libtracker-db and libstemmer all .so files. But I think what you meant was to make each language a .so which we dlopen() using something like GModule, right? That we can possibly do next week. I don't think this should stop the merge to be honest. There are bigger problems to address first. * The directory mtime issue on startup. Work on this will be continuing on Monday. Have I missed anything? New items: * Check mtime for summary files too. Im also adding my tracker-fts stuff into that branch so will likely merge when above + my stuff is ready Yea. Can you make sure your code compiles without warnings before committing in the future if possible :) most of those warning are in the original sqlite source - will try and fix thanks for your efforts jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: On Tue, 2008-09-02 at 12:23 +0100, Martyn Russell wrote: Jamie McCracken wrote: Could we also reduce memory usage by not statically linking to the private libs libtracker-common and libtracker-db? Those libraries should not be available for public use. Before doing so, each API would have to be: a) Documented b) Checked it needs to be public c) Versioned d) ... This is a lot of work and I don't think it is worth it. I haven't looked at the footprints myself though. why we would do all that? we would not be exporting the headers for those libs so no other apps outside of tracker source tree will be able to use it effectively surely there are some examples of private libs that are not statically linked? I mis-understood clearly. I thought you meant make it public for public use. I think making them .so libs but privately used is a good idea. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: On Tue, 2008-09-02 at 12:23 +0100, Martyn Russell wrote: Jamie McCracken wrote: Could we also reduce memory usage by not statically linking to the private libs libtracker-common and libtracker-db? Those libraries should not be available for public use. Before doing so, each API would have to be: a) Documented b) Checked it needs to be public c) Versioned d) ... This is a lot of work and I don't think it is worth it. I haven't looked at the footprints myself though. currently my FTS module and the file-indexer-module are ~ 1MB in size due mostly to linking with them and im sure the size of trackerd and tracker-indexer could be made smaller too with only one instance of those libs in memory How does the memory footprint compare to the old tracker? having looked at the contents of libtracker-common, most of the memory used is for the stemmers - we load them all into memory even though we only use one of them. i think making each language stemmer a dynamically loaded module should help reduce things I can look into doing this. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: trunk only checks directories (If a file in a directory is modified then the directories mtime is also altered so no need to check every file) hence startup is much faster. Note: the mtime of the parent directory ONLY is updated. This is not recursive. So if you have /foo/bar/baz/sliff.txt, the mtime of baz/ is updated not for bar/ and foo/. This means you _HAVE_ to go into every directory to see if it has a subdirectory with an mtime that has updated. We can do this. Can you guarantee that on EVERY file system type the parent directory mtime is updated when a file changes? I am not 100% sure this is the case. on all major platforms yes (*nix and windows) Hmm. This wories me. How mtime is used across file systems tends to vary slightly and this might come back to bite us. it is for me - its in the order of 3x slower than trunk at startup What exactly is 3x slower? The crawling? I have been thinking about this. The best solution here to me is to send ALL files/directories to the indexer and let the indexer check the mtime of a directories before deciding to process the files it holds. This should dramatically reduce the DB lookups on startup. But if the slowness is NOT in the indexer, then there is little you can do except increase the throttle. Have you tested it again recently since I made throttle mandatory whenever it is called (i.e. it is 5+config value). This made a lot of difference for me. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Hi, So I have been reading up on the things that are remaining for merging. This is the list I have so far which I will be working on: * Check the move files/directories issue. I *think* it works. * Fix the get_file_contents() function so it checks for #13 in the first 64Kb. * Make private libraries .so files to dynamically load them. * The directory mtime issue on startup. Have I missed anything? -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-09-03 at 12:34 +0100, Martyn Russell wrote: Jamie McCracken wrote: trunk only checks directories (If a file in a directory is modified then the directories mtime is also altered so no need to check every file) hence startup is much faster. Note: the mtime of the parent directory ONLY is updated. This is not recursive. So if you have /foo/bar/baz/sliff.txt, the mtime of baz/ is updated not for bar/ and foo/. This means you _HAVE_ to go into every directory to see if it has a subdirectory with an mtime that has updated. that is what trunk does - it only checks directories (and subdirectories). Theres no need to check mtime for a file ever unless the parent directory mtime has changed We can do this. Can you guarantee that on EVERY file system type the parent directory mtime is updated when a file changes? I am not 100% sure this is the case. on all major platforms yes (*nix and windows) Hmm. This wories me. How mtime is used across file systems tends to vary slightly and this might come back to bite us. Its not been a problem in the past for tracker and certainly wont be for our target audience it is for me - its in the order of 3x slower than trunk at startup What exactly is 3x slower? The crawling? I have been thinking about this. The best solution here to me is to send ALL files/directories to the indexer and let the indexer check the mtime of a directories before deciding to process the files it holds. This should dramatically reduce the DB lookups on startup. But if the slowness is NOT in the indexer, then there is little you can do except increase the throttle. Have you tested it again recently since I made throttle mandatory whenever it is called (i.e. it is 5+config value). This made a lot of difference for me. trackerd should just pass directories at startup and let the indexer work out what to process. Dbus is not optimised for passing large number of strings. Can the current design easily accommodate this? jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-09-03 at 12:34 +0100, Martyn Russell wrote: Hi, So I have been reading up on the things that are remaining for merging. This is the list I have so far which I will be working on: * Check the move files/directories issue. I *think* it works. check the new directory name can be searched when doing a rename also check the new name is searchable against all items in that directory * Fix the get_file_contents() function so it checks for #13 in the first 64Kb. * Make private libraries .so files to dynamically load them. Also for stemmer - make them dynamically loadable too * The directory mtime issue on startup. also for summary files too - only check em if mtime has changed Have I missed anything? I think that is it. A lot of Prefs dont work but that can wait til after merge. Im also adding my tracker-fts stuff into that branch so will likely merge when above + my stuff is ready jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: trackerd should just pass directories at startup and let the indexer work out what to process. Dbus is not optimised for passing large number of strings. Can the current design easily accommodate this? DBus' optimisation is not an issue here. I can send ALL of my files over quicker than the indexer can mtime check ALL the directories in the database. Yes we can accommodate this. We simply send all files/directories to the indexer and the indexer can check each parent directory first then process the files or discard them if the parent directory mtime is up to date. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-09-03 at 10:32 -0400, Jamie McCracken wrote: On Wed, 2008-09-03 at 12:34 +0100, Martyn Russell wrote: Hi, So I have been reading up on the things that are remaining for merging. This is the list I have so far which I will be working on: * Check the move files/directories issue. I *think* it works. check the new directory name can be searched when doing a rename also check the new name is searchable against all items in that directory * Fix the get_file_contents() function so it checks for #13 in the first 64Kb. also do what trunk does and validate each line. If it fails utf-8 validation attempt to convert from locale. Best to exit with null if any part fails. I assume the gio stuff handles non utf-8? jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-09-03 at 15:31 +0100, Martyn Russell wrote: Jamie McCracken wrote: trackerd should just pass directories at startup and let the indexer work out what to process. Dbus is not optimised for passing large number of strings. Can the current design easily accommodate this? DBus' optimisation is not an issue here. I can send ALL of my files over quicker than the indexer can mtime check ALL the directories in the database. DBus only starts to perform bad as soon as message size grows over 4 kb in size. In 4kb you can put quite a lot of uris. Therefore I don't think we should focus on reducing the amount of uris we send from the daemon to the indexer. Yes we can accommodate this. We simply send all files/directories to the indexer and the indexer can check each parent directory first then process the files or discard them if the parent directory mtime is up to date. -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-09-03 at 16:35 +0200, Philip Van Hoof wrote: On Wed, 2008-09-03 at 15:31 +0100, Martyn Russell wrote: Jamie McCracken wrote: trackerd should just pass directories at startup and let the indexer work out what to process. Dbus is not optimised for passing large number of strings. Can the current design easily accommodate this? DBus' optimisation is not an issue here. I can send ALL of my files over quicker than the indexer can mtime check ALL the directories in the database. DBus only starts to perform bad as soon as message size grows over 4 kb in size. In 4kb you can put quite a lot of uris. Therefore I don't think we should focus on reducing the amount of uris we send from the daemon to the indexer. ok but lets see how it performs first I want startup of a previously indexed machine to be as good or close to trunk jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: Could we also reduce memory usage by not statically linking to the private libs libtracker-common and libtracker-db? Those libraries should not be available for public use. Before doing so, each API would have to be: a) Documented b) Checked it needs to be public c) Versioned d) ... This is a lot of work and I don't think it is worth it. I haven't looked at the footprints myself though. currently my FTS module and the file-indexer-module are ~ 1MB in size due mostly to linking with them and im sure the size of trackerd and tracker-indexer could be made smaller too with only one instance of those libs in memory How does the memory footprint compare to the old tracker? -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Tue, 2008-09-02 at 12:17 +0100, Martyn Russell wrote: Jamie McCracken wrote: Finding more performance issues on an up to date indexed home directory, the next restart of trackerd checks every single file to see if its up to date - why? Because we redesigned the whole code base and haven't finished optimising it yet. fair enough trunk only checks directories (If a file in a directory is modified then the directories mtime is also altered so no need to check every file) hence startup is much faster. We can do this. Can you guarantee that on EVERY file system type the parent directory mtime is updated when a file changes? I am not 100% sure this is the case. on all major platforms yes (*nix and windows) This needs to be restored as the performance of indexer-split is horrendous at startup It isn't that bad. it is for me - its in the order of 3x slower than trunk at startup jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Tue, 2008-09-02 at 12:23 +0100, Martyn Russell wrote: Jamie McCracken wrote: Could we also reduce memory usage by not statically linking to the private libs libtracker-common and libtracker-db? Those libraries should not be available for public use. Before doing so, each API would have to be: a) Documented b) Checked it needs to be public c) Versioned d) ... This is a lot of work and I don't think it is worth it. I haven't looked at the footprints myself though. why we would do all that? we would not be exporting the headers for those libs so no other apps outside of tracker source tree will be able to use it effectively surely there are some examples of private libs that are not statically linked? currently my FTS module and the file-indexer-module are ~ 1MB in size due mostly to linking with them and im sure the size of trackerd and tracker-indexer could be made smaller too with only one instance of those libs in memory How does the memory footprint compare to the old tracker? resident memory is a lot steeper and thats even before its started indexing ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Finding more performance issues on an up to date indexed home directory, the next restart of trackerd checks every single file to see if its up to date - why? trunk only checks directories (If a file in a directory is modified then the directories mtime is also altered so no need to check every file) hence startup is much faster. This needs to be restored as the performance of indexer-split is horrendous at startup ps same with summary files - only need to check if mtime is differs in tracker's database jamie On Fri, 2008-08-22 at 15:11 +0200, Philip Van Hoof wrote: On Fri, 2008-08-22 at 11:42 +0100, Martyn Russell wrote: Jamie McCracken wrote: also search is still blocked for 10-20 seconds even when indexer is not active - why does it take so long to pause the indexer? Its completley unusable like that. the indexer must pause in under a second. We fixed that bug yesterday. I am not sure if it was the double free you added to your commit ;) or the fix Carlos added where the index was being reopened. (I think) It was the commit of the transaction in the indexer, when the indexer is asked to pause. cd branches/indexer-split svn diff -r 2134:2135 ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: still getting lots of serious problems when running email-contents.db does not grow - why are we not saving email body contents here? Also what happened to email-index.db? Why have you combined files and emails index? Not sure why this was decided, but I think amongst the refactoring the old code wasn't reinstated so we got rid of what was left over (of the email-index.db types, etc). Anyway, this has been re-added. remember the bigger the index the slower it is to update which is why we have separate indexes for emails and files (should be file-index.db and email-index.db) Yep. The fix seems to elevate this significantly. it also breaks tracker-search-tool as it tries to show emails category twice I am not sure that's the reason it does that actually, but still. [EMAIL PROTECTED]:~/.cache/tracker$ ls -l total 152428 -rw-r--r-- 1 jamie jamie12288 2008-08-20 23:41 email-contents.db -rw-r--r-- 1 jamie jamie 13492224 2008-08-20 23:51 email-meta.db -rw-r--r-- 1 jamie jamie 107216 2008-08-20 23:51 email-meta.db-journal -rw-r--r-- 1 jamie jamie 42377216 2008-08-20 23:50 file-contents.db -rw-r--r-- 1 jamie jamie 17080320 2008-08-20 23:50 file-meta.db -rw-r--r-- 1 jamie jamie 80308976 2008-08-20 23:51 index.db -rw-r--r-- 1 jamie jamie 2359292 2008-08-20 23:41 index-update.db -rw-r--r-- 1 jamie jamie 151552 2008-08-20 23:41 xesam.db Now we have email-index.db and file-index.db: [EMAIL PROTECTED]:~$ ls -l /home/martyn/.cache/tracker/ total 680604 -rw-r--r-- 1 martyn martyn729088 2008-08-22 11:39 email-contents.db -rw-r--r-- 1 martyn martyn 74114696 2008-08-22 11:42 email-index.db -rw-r--r-- 1 martyn martyn114688 2008-08-22 11:39 email-meta.db -rw-r--r-- 1 martyn martyn 177500160 2008-08-22 11:29 file-contents.db -rw-r--r-- 1 martyn martyn 387498224 2008-08-22 11:30 file-index.db -rw-r--r-- 1 martyn martyn 2359292 2008-08-22 10:59 file-index-update.db -rw-r--r-- 1 martyn martyn 54284288 2008-08-22 11:30 file-meta.db -rw-r--r-- 1 martyn martyn151552 2008-08-22 10:59 xesam.db also after renaming a folder I could not search for the new name I need to check this. I have not had time yet and I leave to go on vacation tonight so I doubt I will be able to. I can leave this task to Carlos, Ivan, Phillip and Mikael to look at. also search is still blocked for 10-20 seconds even when indexer is not active - why does it take so long to pause the indexer? Its completley unusable like that. the indexer must pause in under a second. We fixed that bug yesterday. I am not sure if it was the double free you added to your commit ;) or the fix Carlos added where the index was being reopened. Either way, it seems infinitely better now. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Fri, 2008-08-22 at 11:42 +0100, Martyn Russell wrote: Jamie McCracken wrote: also search is still blocked for 10-20 seconds even when indexer is not active - why does it take so long to pause the indexer? Its completley unusable like that. the indexer must pause in under a second. We fixed that bug yesterday. I am not sure if it was the double free you added to your commit ;) or the fix Carlos added where the index was being reopened. (I think) It was the commit of the transaction in the indexer, when the indexer is asked to pause. cd branches/indexer-split svn diff -r 2134:2135 -- Philip Van Hoof, freelance software developer home: me at pvanhoof dot be gnome: pvanhoof at gnome dot org http://pvanhoof.be/blog http://codeminded.be ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata This should be committed now. From my brief testing it seemed to work. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-08-20 at 12:49 +0100, Martyn Russell wrote: Jamie McCracken wrote: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata This should be committed now. From my brief testing it seemed to work. ok thanks i will give it a spin tonight and providing nothing major is wrong we can merge tomorrow jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-08-20 at 12:49 +0100, Martyn Russell wrote: Jamie McCracken wrote: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata This should be committed now. From my brief testing it seemed to work. still getting lots of serious problems when running email-contents.db does not grow - why are we not saving email body contents here? Also what happened to email-index.db? Why have you combined files and emails index? remember the bigger the index the slower it is to update which is why we have separate indexes for emails and files (should be file-index.db and email-index.db) it also breaks tracker-search-tool as it tries to show emails category twice [EMAIL PROTECTED]:~/.cache/tracker$ ls -l total 152428 -rw-r--r-- 1 jamie jamie12288 2008-08-20 23:41 email-contents.db -rw-r--r-- 1 jamie jamie 13492224 2008-08-20 23:51 email-meta.db -rw-r--r-- 1 jamie jamie 107216 2008-08-20 23:51 email-meta.db-journal -rw-r--r-- 1 jamie jamie 42377216 2008-08-20 23:50 file-contents.db -rw-r--r-- 1 jamie jamie 17080320 2008-08-20 23:50 file-meta.db -rw-r--r-- 1 jamie jamie 80308976 2008-08-20 23:51 index.db -rw-r--r-- 1 jamie jamie 2359292 2008-08-20 23:41 index-update.db -rw-r--r-- 1 jamie jamie 151552 2008-08-20 23:41 xesam.db also after renaming a folder I could not search for the new name also search is still blocked for 10-20 seconds even when indexer is not active - why does it take so long to pause the indexer? Its completley unusable like that. the indexer must pause in under a second. Can you fix above please before merging (sorry I missed it before) thanks jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: On Fri, 2008-08-15 at 11:01 +0100, Martyn Russell wrote: Jamie McCracken wrote: As I will be working quite extensively on trunk post merge I require any major changes to be done ASAP list of changes required prior to merge (in order of priority) - all of these already exist and work in trunk: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata Hi, This is done in the indexer, the daemon, however, currently has no way of knowing about files linked by moves. Instead, GIO gives us DELETED and CREATED events. That is quite unacceptable I think. I have created a bug report about this: http://bugzilla.gnome.org/show_bug.cgi?id=547890 We can in the mean time perhaps add some glue by checking the md5sum of the 2 files to see if they are the same and the events occur within the same 2 seconds perhaps? I would rather not do this. But it might be necessary for the time being. can we fork the gio monitor code and inline it into our source tree then? when updated glib with that functionality is available we can swap it out Not really. The GIO code isn't the easiest code to fork. I think doing that would take longer than finding another, easier solution. I am investigating this. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Mon, 2008-08-18 at 10:05 +0100, Martyn Russell wrote: Jamie McCracken wrote: On Fri, 2008-08-15 at 11:01 +0100, Martyn Russell wrote: Jamie McCracken wrote: As I will be working quite extensively on trunk post merge I require any major changes to be done ASAP list of changes required prior to merge (in order of priority) - all of these already exist and work in trunk: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata Hi, This is done in the indexer, the daemon, however, currently has no way of knowing about files linked by moves. Instead, GIO gives us DELETED and CREATED events. That is quite unacceptable I think. I have created a bug report about this: http://bugzilla.gnome.org/show_bug.cgi?id=547890 We can in the mean time perhaps add some glue by checking the md5sum of the 2 files to see if they are the same and the events occur within the same 2 seconds perhaps? I would rather not do this. But it might be necessary for the time being. can we fork the gio monitor code and inline it into our source tree then? when updated glib with that functionality is available we can swap it out Not really. The GIO code isn't the easiest code to fork. I think doing that would take longer than finding another, easier solution. I am investigating this. I meant just the monitor code not the whole of GIO jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Fri, 2008-08-15 at 11:01 +0100, Martyn Russell wrote: Jamie McCracken wrote: As I will be working quite extensively on trunk post merge I require any major changes to be done ASAP list of changes required prior to merge (in order of priority) - all of these already exist and work in trunk: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata Hi, This is done in the indexer, the daemon, however, currently has no way of knowing about files linked by moves. Instead, GIO gives us DELETED and CREATED events. That is quite unacceptable I think. I have created a bug report about this: http://bugzilla.gnome.org/show_bug.cgi?id=547890 We can in the mean time perhaps add some glue by checking the md5sum of the 2 files to see if they are the same and the events occur within the same 2 seconds perhaps? I would rather not do this. But it might be necessary for the time being. can we fork the gio monitor code and inline it into our source tree then? when updated glib with that functionality is available we can swap it out jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: As I will be working quite extensively on trunk post merge I require any major changes to be done ASAP list of changes required prior to merge (in order of priority) - all of these already exist and work in trunk: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata Hi, This is done in the indexer, the daemon, however, currently has no way of knowing about files linked by moves. Instead, GIO gives us DELETED and CREATED events. That is quite unacceptable I think. I have created a bug report about this: http://bugzilla.gnome.org/show_bug.cgi?id=547890 We can in the mean time perhaps add some glue by checking the md5sum of the 2 files to see if they are the same and the events occur within the same 2 seconds perhaps? I would rather not do this. But it might be necessary for the time being. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: On Wed, 2008-08-13 at 19:30 +0200, Carlos Garnacho wrote: As far as I see, for mbox you're storing the offset in the stream: msg_offset = g_mime_parser_tell (mf-parser); mail_msg-offset = msg_offset; For IMAP, I just get 0 in the Services table, also didn't get to see any code to do this. imap stores message count too - its count rather than byte offset As Carlos says, this code is NOT working in TRUNK for IMAP. So this whole argument is moot. no when junk/deleted email is encountered during the start up scan its UID is checked against that table (JunkMails) to see if we already know about it. If its not in that table then we add it and then delete it from our index. Ergo its more efficient than what you have The whole idea of keeping a separate table for deleted/junk email sounds really inefficient to me. I have quite a bit and I get quite a bit every day, that's a lot of extra processing. Surely it is MORE processing than the current inefficiencies you are outlining with our current design? Could you tell me where's that code? The only users for InsertJunk/LookupJunk (the stored procedures) are tracker_db_email_insert_junk() and tracker_db_email_lookup_junk(), the former is also the only user of the latter, and it doesn't do what you mention. The only place I see where it could delete emails from the DB for Evolution is check_summary_file(), and tracker_db_email_delete_email() seems to be called inconditionally for any junk/deleted message found. the way it should work is as described above I had tested it and it works (deleted and junk emails are pruned on next restart of trackerd) What you're saying here doesn't make a lot of sense to me. It sounds like you're saying that if mail is marked as junk or deleted you don't want to update the index until we restart the daemon? So people will still be searching and finding junk until trackerd is restarted? That doesn't sound right to me. Or did you mean something else? How do you currently tell which emails are new in the summary file? Without storing the count you cannot know without verifying each email exists in the services table (which would obviously be unacceptable performance wise) You haven't answered the question. Where is the code? the trunk way is faster so i would prefer that restored TRUNK doesn't work as you think it does. If you bear with me, I'd prefer to try a few optimizations before having to add special cases. well not doing the junk/deletion check everytime the summary file changes must obviously be faster? Plus Carlos is right, this code can probably be optimised much more than it is now. It has just been written to get working so far. Sure, but it's also more beneficial for users if tracker DB contents are up to date with the actual data. Also, IMHO adding special cases like this would break a design that makes tracker really extensible and easy to develop for. Carlos has spent a lot of time designing this. I spoke further with him about it too, we could change the way we do things now to use GTypeModule and GInterface to make it extensible, but that will take a few days at least to do. This issue in general is not a show stopper, it is a performance issue, the performance issue we have with index.db (which you say you will fix next week by using SQLite with FTS) is much more of an issue than this by far. I would suggest we merge and resolve these on trunk so you can get on Jamie. that can be done easily - for quick synch test just check last known UID in summary file (using stored message count) exists in services - if it does not then you have a count mismatch and a resync is required I don't claim to know much about the UID, but what if you receive a mail and delete a mail - won't the count the be the same? Resulting in your count check for a resync breaking? this can be done whenever a new email arrives as its not expensive suggest having a resync method to do above and a check_synch one to test its ok We could have this soon, but it won't be today unfortunately. Carlos is on vacation. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Fri, 2008-08-15 at 11:01 +0100, Martyn Russell wrote: Jamie McCracken wrote: As I will be working quite extensively on trunk post merge I require any major changes to be done ASAP list of changes required prior to merge (in order of priority) - all of these already exist and work in trunk: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata Hi, This is done in the indexer, the daemon, however, currently has no way of knowing about files linked by moves. Instead, GIO gives us DELETED and CREATED events. That is quite unacceptable I think. I have created a bug report about this: http://bugzilla.gnome.org/show_bug.cgi?id=547890 We can in the mean time perhaps add some glue by checking the md5sum of the 2 files to see if they are the same and the events occur within the same 2 seconds perhaps? I would rather not do this. But it might be necessary for the time being. why not use the native inotify and just use gio file monitoring for the others? when gio has the new functionality we can then replace inotify with it jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Hi!, On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote: snip that sounds inefficient - trunk only ever checked for existing deleted or junk emails at startup because iterating through all emails in the summary files is expensive. From what I've read in trunk code, you still iterate through all the mails in the summary in check_summary_file(), and you will have to iterate over them again later to index new messages, etc... As far as I know, it's quite unavoidable to parse again summaries, since under some circumstances Message IDs could be reused, which would leave you with inconsistent data in the DBs. Even if it isn't, expunging a folder would render any stored offset for the summary file useless (even dangerous). Besides, when testing summary parsing, I remember it was pretty fast (like 2-3 seconds for a ~6500 emails summary), of course without inserting to DBs nor doing message body or attachments sniffing, which is more or less what should happen if the junk/deleted flag is set. the use of a separate junk email table meant lookups were confined to that table and not the services table so was faster when number of emails was high You mean the JunkMails table in email-meta.db? As far as I see, this table is just looked up to make sure there aren't duplicates when inserting. And in the end, you still have to lookup/modify the Services table, even if the junk mail wasn't there. we should also avoid doing this whenever the summary file changes which is why we stored an offset in trunk so we skip over messages to get to the new ones only when summary files change or do nothing if no new ones are present As said above, I think there are pretty good reasons to avoid this. the trunk way is faster so i would prefer that restored If you bear with me, I'd prefer to try a few optimizations before having to add special cases. Regards, Carlos thanks jamie -- Carlos Garnacho Imendio AB - Expert solutions in GTK+ http://www.imendio.com ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote: Hi!, On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote: snip that sounds inefficient - trunk only ever checked for existing deleted or junk emails at startup because iterating through all emails in the summary files is expensive. From what I've read in trunk code, you still iterate through all the mails in the summary in check_summary_file(), and you will have to iterate over them again later to index new messages, etc... yes but when we are not doing the startup check, we are skipping so its faster and we are not stopping at any deleted or junk email and checking it As far as I know, it's quite unavoidable to parse again summaries, since under some circumstances Message IDs could be reused, which would leave you with inconsistent data in the DBs. Even if it isn't, expunging a folder would render any stored offset for the summary file useless (even dangerous). true but we would get a deletion from inotify of the summary file if that was the case. Its not a byte offset but message count - so we skip x messages to get the new ones (similar to what beagle does) Besides, when testing summary parsing, I remember it was pretty fast (like 2-3 seconds for a ~6500 emails summary), of course without inserting to DBs nor doing message body or attachments sniffing, which is more or less what should happen if the junk/deleted flag is set. with 100,000+ emails its quite noticeable the use of a separate junk email table meant lookups were confined to that table and not the services table so was faster when number of emails was high You mean the JunkMails table in email-meta.db? As far as I see, this table is just looked up to make sure there aren't duplicates when inserting. And in the end, you still have to lookup/modify the Services table, even if the junk mail wasn't there. no when junk/deleted email is encountered during the start up scan its UID is checked against that table (JunkMails) to see if we already know about it. If its not in that table then we add it and then delete it from our index. Ergo its more efficient than what you have we should also avoid doing this whenever the summary file changes which is why we stored an offset in trunk so we skip over messages to get to the new ones only when summary files change or do nothing if no new ones are present As said above, I think there are pretty good reasons to avoid this. the trunk way is faster so i would prefer that restored If you bear with me, I'd prefer to try a few optimizations before having to add special cases. well not doing the junk/deletion check everytime the summary file changes must obviously be faster? jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: On Wed, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote: Hi!, Hi :) On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote: snip that sounds inefficient - trunk only ever checked for existing deleted or junk emails at startup because iterating through all emails in the summary files is expensive. From what I've read in trunk code, you still iterate through all the mails in the summary in check_summary_file(), and you will have to iterate over them again later to index new messages, etc... yes but when we are not doing the startup check, we are skipping so its faster and we are not stopping at any deleted or junk email and checking it Of course it is faster, but that doesn't mean we are completely synchronised - unless I missed the point. If you had an email in the summary file before and then you mark it as deleted or junk, the summary file is out of date. If this is done when tracker isn't running or on another machine, etc. you would _HAVE_ to read the summary file again on start up to make sure you were synchronised. At least that's how I understand it. As far as I know, it's quite unavoidable to parse again summaries, since under some circumstances Message IDs could be reused, which would leave you with inconsistent data in the DBs. Even if it isn't, expunging a folder would render any stored offset for the summary file useless (even dangerous). true but we would get a deletion from inotify of the summary file if that was the case. Its not a byte offset but message count - so we skip x messages to get the new ones (similar to what beagle does) As I illustrated above, you can't guarantee Tracker is either: 1) running all the time 2) email isn't deleted/etc from another machine/client/webserver/etc. Besides, when testing summary parsing, I remember it was pretty fast (like 2-3 seconds for a ~6500 emails summary), of course without inserting to DBs nor doing message body or attachments sniffing, which is more or less what should happen if the junk/deleted flag is set. with 100,000+ emails its quite noticeable The difference is not really an issue. Most people don't have that many emails. For those that do, they can expect to wait a bit longer. Really, the difference you are arguing about here is insignificant. If you have to wait another 30 seconds because you have a ridiculous number of emails, I don't think that is a problem especially if you are guaranteeing synchronicity. the use of a separate junk email table meant lookups were confined to that table and not the services table so was faster when number of emails was high You mean the JunkMails table in email-meta.db? As far as I see, this table is just looked up to make sure there aren't duplicates when inserting. And in the end, you still have to lookup/modify the Services table, even if the junk mail wasn't there. no when junk/deleted email is encountered during the start up scan its UID is checked against that table (JunkMails) to see if we already know about it. If its not in that table then we add it and then delete it from our index. Ergo its more efficient than what you have So if you remove this step completely and just check the index on start up shouldn't it be JUST as efficient? Checking a table for junk and keeping that synchronised should be just as wasteful as scanning the summary file I would imagine. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-08-13 at 17:00 +0100, Martyn Russell wrote: Jamie McCracken wrote: On Wed, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote: Hi!, Hi :) On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote: snip that sounds inefficient - trunk only ever checked for existing deleted or junk emails at startup because iterating through all emails in the summary files is expensive. From what I've read in trunk code, you still iterate through all the mails in the summary in check_summary_file(), and you will have to iterate over them again later to index new messages, etc... yes but when we are not doing the startup check, we are skipping so its faster and we are not stopping at any deleted or junk email and checking it Of course it is faster, but that doesn't mean we are completely synchronised - unless I missed the point. If you had an email in the summary file before and then you mark it as deleted or junk, the summary file is out of date. If this is done when tracker isn't running or on another machine, etc. you would _HAVE_ to read the summary file again on start up to make sure you were synchronised. At least that's how I understand it. As far as I know, it's quite unavoidable to parse again summaries, since under some circumstances Message IDs could be reused, which would leave you with inconsistent data in the DBs. Even if it isn't, expunging a folder would render any stored offset for the summary file useless (even dangerous). true but we would get a deletion from inotify of the summary file if that was the case. Its not a byte offset but message count - so we skip x messages to get the new ones (similar to what beagle does) As I illustrated above, you can't guarantee Tracker is either: 1) running all the time 2) email isn't deleted/etc from another machine/client/webserver/etc. such synchronicty can be done at startup - we will know if something is modified theres no need to attempt it everytime an email arrives which is my point it was optimised before but the changes in your branch have removed this - pls revert jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Hi :), On mié, 2008-08-13 at 11:47 -0400, Jamie McCracken wrote: On Wed, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote: Hi!, On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote: snip that sounds inefficient - trunk only ever checked for existing deleted or junk emails at startup because iterating through all emails in the summary files is expensive. From what I've read in trunk code, you still iterate through all the mails in the summary in check_summary_file(), and you will have to iterate over them again later to index new messages, etc... yes but when we are not doing the startup check, we are skipping so its faster and we are not stopping at any deleted or junk email and checking it How much time to you plan to save doing fseek() instead of fread()? I've updated the code in indexer-split to just read over the message when it gets to a deleted/junk message, and read_summary() could be changed to do fseek() if no data is asked. That makes the indexer-split code do one pass where trunk does two. Less disk head movement I'd say :) Also, take into account that you're forced to fread() even if you're skipping a message, since you have to know strings length to be able to skip them. As far as I know, it's quite unavoidable to parse again summaries, since under some circumstances Message IDs could be reused, which would leave you with inconsistent data in the DBs. Even if it isn't, expunging a folder would render any stored offset for the summary file useless (even dangerous). true but we would get a deletion from inotify of the summary file if that was the case. Its not a byte offset but message count - so we skip x messages to get the new ones (similar to what beagle does) With Expunge I meant tell $MAIL_APP to get rid of deleted messages in the mail folder, in Evolution that would change the summary file and mess up offsets for sure. As far as I see, for mbox you're storing the offset in the stream: msg_offset = g_mime_parser_tell (mf-parser); mail_msg-offset = msg_offset; For IMAP, I just get 0 in the Services table, also didn't get to see any code to do this. Besides, when testing summary parsing, I remember it was pretty fast (like 2-3 seconds for a ~6500 emails summary), of course without inserting to DBs nor doing message body or attachments sniffing, which is more or less what should happen if the junk/deleted flag is set. with 100,000+ emails its quite noticeable the use of a separate junk email table meant lookups were confined to that table and not the services table so was faster when number of emails was high You mean the JunkMails table in email-meta.db? As far as I see, this table is just looked up to make sure there aren't duplicates when inserting. And in the end, you still have to lookup/modify the Services table, even if the junk mail wasn't there. no when junk/deleted email is encountered during the start up scan its UID is checked against that table (JunkMails) to see if we already know about it. If its not in that table then we add it and then delete it from our index. Ergo its more efficient than what you have Could you tell me where's that code? The only users for InsertJunk/LookupJunk (the stored procedures) are tracker_db_email_insert_junk() and tracker_db_email_lookup_junk(), the former is also the only user of the latter, and it doesn't do what you mention. The only place I see where it could delete emails from the DB for Evolution is check_summary_file(), and tracker_db_email_delete_email() seems to be called inconditionally for any junk/deleted message found. we should also avoid doing this whenever the summary file changes which is why we stored an offset in trunk so we skip over messages to get to the new ones only when summary files change or do nothing if no new ones are present As said above, I think there are pretty good reasons to avoid this. the trunk way is faster so i would prefer that restored If you bear with me, I'd prefer to try a few optimizations before having to add special cases. well not doing the junk/deletion check everytime the summary file changes must obviously be faster? Sure, but it's also more beneficial for users if tracker DB contents are up to date with the actual data. Also, IMHO adding special cases like this would break a design that makes tracker really extensible and easy to develop for. Regards, Carlos -- Carlos Garnacho Imendio AB - Expert solutions in GTK+ http://www.imendio.com ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Wed, 2008-08-13 at 19:30 +0200, Carlos Garnacho wrote: Hi :), On mié, 2008-08-13 at 11:47 -0400, Jamie McCracken wrote: On Wed, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote: Hi!, On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote: snip that sounds inefficient - trunk only ever checked for existing deleted or junk emails at startup because iterating through all emails in the summary files is expensive. From what I've read in trunk code, you still iterate through all the mails in the summary in check_summary_file(), and you will have to iterate over them again later to index new messages, etc... yes but when we are not doing the startup check, we are skipping so its faster and we are not stopping at any deleted or junk email and checking it How much time to you plan to save doing fseek() instead of fread()? I've updated the code in indexer-split to just read over the message when it gets to a deleted/junk message, and read_summary() could be changed to do fseek() if no data is asked. That makes the indexer-split code do one pass where trunk does two. Less disk head movement I'd say :) Also, take into account that you're forced to fread() even if you're skipping a message, since you have to know strings length to be able to skip them. I know that - what I want to avoid is doing lookups on the email services table whenever it returns null whenever there is a new email As far as I know, it's quite unavoidable to parse again summaries, since under some circumstances Message IDs could be reused, which would leave you with inconsistent data in the DBs. Even if it isn't, expunging a folder would render any stored offset for the summary file useless (even dangerous). true but we would get a deletion from inotify of the summary file if that was the case. Its not a byte offset but message count - so we skip x messages to get the new ones (similar to what beagle does) With Expunge I meant tell $MAIL_APP to get rid of deleted messages in the mail folder, in Evolution that would change the summary file and mess up offsets for sure. As far as I see, for mbox you're storing the offset in the stream: msg_offset = g_mime_parser_tell (mf-parser); mail_msg-offset = msg_offset; For IMAP, I just get 0 in the Services table, also didn't get to see any code to do this. imap stores message count too - its count rather than byte offset Besides, when testing summary parsing, I remember it was pretty fast (like 2-3 seconds for a ~6500 emails summary), of course without inserting to DBs nor doing message body or attachments sniffing, which is more or less what should happen if the junk/deleted flag is set. with 100,000+ emails its quite noticeable the use of a separate junk email table meant lookups were confined to that table and not the services table so was faster when number of emails was high You mean the JunkMails table in email-meta.db? As far as I see, this table is just looked up to make sure there aren't duplicates when inserting. And in the end, you still have to lookup/modify the Services table, even if the junk mail wasn't there. no when junk/deleted email is encountered during the start up scan its UID is checked against that table (JunkMails) to see if we already know about it. If its not in that table then we add it and then delete it from our index. Ergo its more efficient than what you have Could you tell me where's that code? The only users for InsertJunk/LookupJunk (the stored procedures) are tracker_db_email_insert_junk() and tracker_db_email_lookup_junk(), the former is also the only user of the latter, and it doesn't do what you mention. The only place I see where it could delete emails from the DB for Evolution is check_summary_file(), and tracker_db_email_delete_email() seems to be called inconditionally for any junk/deleted message found. the way it should work is as described above I had tested it and it works (deleted and junk emails are pruned on next restart of trackerd) How do you currently tell which emails are new in the summary file? Without storing the count you cannot know without verifying each email exists in the services table (which would obviously be unacceptable performance wise) we should also avoid doing this whenever the summary file changes which is why we stored an offset in trunk so we skip over messages to get to the new ones only when summary files change or do nothing if no new ones are present As said above, I think there are pretty good reasons to avoid this. the trunk way is faster so i would prefer that restored If you bear with me, I'd prefer to try a few optimizations before having to add special cases. well not doing the junk/deletion
Re: [Tracker] more issues with indexer-split
Hi!, On lun, 2008-08-11 at 22:49 -0400, Jamie McCracken wrote: As I will be working quite extensively on trunk post merge I require any major changes to be done ASAP list of changes required prior to merge (in order of priority) - all of these already exist and work in trunk: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata 2) I could not see deletion of deleted and junk emails at startup (I see it for new emails only but not for older emails that may have been marked) - pls restore this functionality from trunk. Code needs to check all summary files from start to end for junk and if not already marked as junk in the JunkMail table then you must delete them and mark them. If in doubt see trunk - older emails marked as deleted/junk will remain in index which is unacceptable. see function check_summary_file in trunks tracker-email-evolution.c This item is done in a different way: When trackerd notices the summary file has changed (or does the initial files processing), notifies the indexer. The indexer, which is the one aware of the actual file contents, iterates through the messages, and the mail module will return NULL metadata for any junk/deleted email, making tracker-indexer delete anything in the DBs/index related to these mails. Regards, Carlos ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
Jamie McCracken wrote: As I will be working quite extensively on trunk post merge I require any major changes to be done ASAP OK. list of changes required prior to merge (in order of priority) - all of these already exist and work in trunk: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata This was heavily worked on today, progress is ongoing. 2) I could not see deletion of deleted and junk emails at startup (I see it for new emails only but not for older emails that may have been marked) - pls restore this functionality from trunk. Code needs to check all summary files from start to end for junk and if not already marked as junk in the JunkMail table then you must delete them and mark them. If in doubt see trunk - older emails marked as deleted/junk will remain in index which is unacceptable. see function check_summary_file in trunks tracker-email-evolution.c Carlos has some comments to add here. 3) Clean and timely shutdown of trackerd and indexer if running (maybe have a tracker-shutdown command as we have dbus interface for this which is needed by the prefs/applet). Anyway crawler must end immediately and must regularly force glib mainloop iterations so that any dbus requests are handled in a timely fashion (they appear to be heavily slowed down by the crawler) Fixed. 4) Pause on detected external IO (IE from other apps). If we detect writes in the watched directories the original trackerd paused for a few seconds. this greatly sped up things like compiling sources which were dog slow due to indexing. This appears to have been removed in the branch - see pause_io in tracker-status.c Fixed. 5) pause on battery (both for first time index and subsequent ones) - needs to be done I think? (appears commented out in tracker-status.c) Fixed. -- Regards, Martyn ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list
Re: [Tracker] more issues with indexer-split
On Tue, 2008-08-12 at 18:01 +0100, Martyn Russell wrote: Jamie McCracken wrote: As I will be working quite extensively on trunk post merge I require any major changes to be done ASAP OK. list of changes required prior to merge (in order of priority) - all of these already exist and work in trunk: 1) trackerd: Handle file moves - update files in a directory recursively when a directory is renamed/moved (need to pause indexer before updating - watch out!). Likewise re-enable update of index from trackerd as its needed for tagging and other user metadata This was heavily worked on today, progress is ongoing. 2) I could not see deletion of deleted and junk emails at startup (I see it for new emails only but not for older emails that may have been marked) - pls restore this functionality from trunk. Code needs to check all summary files from start to end for junk and if not already marked as junk in the JunkMail table then you must delete them and mark them. If in doubt see trunk - older emails marked as deleted/junk will remain in index which is unacceptable. see function check_summary_file in trunks tracker-email-evolution.c Carlos has some comments to add here. 3) Clean and timely shutdown of trackerd and indexer if running (maybe have a tracker-shutdown command as we have dbus interface for this which is needed by the prefs/applet). Anyway crawler must end immediately and must regularly force glib mainloop iterations so that any dbus requests are handled in a timely fashion (they appear to be heavily slowed down by the crawler) Fixed. 4) Pause on detected external IO (IE from other apps). If we detect writes in the watched directories the original trackerd paused for a few seconds. this greatly sped up things like compiling sources which were dog slow due to indexing. This appears to have been removed in the branch - see pause_io in tracker-status.c Fixed. 5) pause on battery (both for first time index and subsequent ones) - needs to be done I think? (appears commented out in tracker-status.c) Fixed. thanks - sounds like it will be ready in a few days jamie ___ tracker-list mailing list tracker-list@gnome.org http://mail.gnome.org/mailman/listinfo/tracker-list