Re: [Tracker] more issues with indexer-split

2008-09-18 Thread Martyn Russell
Jamie McCracken wrote:
 On Tue, 2008-09-16 at 18:35 +0100, Martyn Russell wrote:
 
 yes its much better
 
 tracker-search works on command line but tracker-search-tool still kills
 the daemon

Which API call was it that kills the daemon again? The GetHitCountAll?

 Im happy for you to merge once that last issue is resolved

I just ran the TST in valgrind and it was fine with a few leaks of course.

 if you cant replicate it then we can as you say sort it out at the
 hackfest

I have to leave in a few hours so perhaps we can just do that.

 thanks for all your (and your teams) hard work. Im very happy with
 indexer-split and it now runs like a charm

Our pleasure.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-16 Thread Martyn Russell
Jamie McCracken wrote:
 On Fri, 2008-09-12 at 18:35 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
 Note that existing config options must be respected as otherwise
 upgrading will be impossible for existing users
 This option is not honoured then. I do it one of 2 ways right now. Either:

 1. trackerd -d evolution
 2. DisabledModules=evolution;

 The IndexEvolutionEmails option must have been overlooked.
 Can I ask, have you tried removing your config file too?

 
 no because it needs to work with existing settings

I have fixed this issue now so legacy options like IndexEvolutionEmails
now works.

 I will look into working on an upgrade path to fix this on Monday.
 
 ok - I will be travelling monday to UK - I will try and check again on
 tuesday

OK.

We have fixed a whole host of issues since in the last 2 days. One of
which was the huge slow down in indexing speed. Other efficiencies have
been made too.

 Which file are you checking for indexed words?
 
 all of them - i just my name jamie to search as it should match all
 files as they are in path /home/jamie

I installed packages on the N810 device just tonight and this all works
as it should:

Checking for extensions by partial name matching:
=

$ tracker-search -s Files pdf
Results:
/home/user/MyDocs/.documents/osso_software_copyright.pdf
/home/user/MyDocs/.documents/~sfil_li_folder_user_guides/User_guide_English_US.pdf
/home/user/MyDocs/.documents/~sfil_li_folder_user_guides/Gebruikershandleiding_Nederlands.pdf
/home/user/MyDocs/.documents/~sfil_li_folder_user_guides/User_guide_English_GB.pdf
/home/user/MyDocs/.documents/~sfil_li_folder_user_guides/Brukermanual_Norsk.pdf
/home/user/MyDocs/.documents/~sfil_li_folder_user_guides/User_guide_Arabic.pdf
/home/user/MyDocs/.documents/~sfil_li_folder_user_guides/Brugervejledning_Dansk.pdf
/home/user/MyDocs/.documents/~sfil_li_folder_user_guides/Bedienungsanleitung_Deutsch.pdf
/home/user/MyDocs/.documents/~sfil_li_folder_user_guides/Manuale_d'uso_Italiano.pdf


Searching for all folders:
==

$ tracker-files -s Folders
Results:
  /home/user/MyDocs/.documents/~sfil_li_folder_user_guides


Searching for all music:


$ tracker-files -s Music
Results:
  /home/user/MyDocs/.sounds/Moby-In_My_Heart.mp3


We can see if we can help you in Berlin. See you there! :)

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-12 Thread Martyn Russell
Jamie McCracken wrote:
 On Wed, 2008-09-10 at 23:20 -0400, Jamie McCracken wrote:
 Im afraid Im unable to run latest svn

 trackerd dies everytime I try and search for somehting

 I did following:

 svn up
 make distclean
 make
 sudo make install
 sudo rm -rf /usr/bin/trackerd
 sudo rm -rf /usr/bin/tracker-indexer
 rm -rf ~/.cache/tracker
 rm-rf ~/.local/share/tracker
 trackerd -v 3 

 searching with tracker-search-tool crashes trackerd
 searching with tracker-search returns no result (when it should)

I don't see this crash.

Note: in the last 24 hrs, Mikael has fixed a couple of nasty issues
which could improve your situation. Namely metadata date handling issues
and MP3  JPEG extractor fixes. I also changed the extractor to use
libtracker-common functions instead of the duplicated code it was using.

 trackerd output showing it continuously outputs :
 Tracker-Message: Indexed 105/425, module:'evolution', 07m 13s left, 02m
 22s elapsed

Note: It will spit status messages out roughly every 10 seconds to keep
the daemon and applet up to date. Even if nothing has happened.

 note I have evol email indexing disabled

How have you disabled that?

I tried it with -d evolution and with the DisabledModules config option
in the .cfg file. Both worked fine for me.

 can you verify it runs correctly with reindex when evo email indeixng is
 set to false?

Yes, I have done that twice. I have no problems with the indexing or
searching with the search tools either. I have tried this on my desktop
and I have done this on the Nokia device too. Both work properly.

Are you able to try on another machine?

 for me it appears the index never flushes as it constantly tries to
 index evo stuff but the indexer rejects it

Hmm, if you run the daemon with -v 3 it should say when it gets to the
evolution module if it is disabled or not.

You don't have the evolution mail directory in your WatchDirectoryRoots
do you?

 also can you confirm if tracker-search-tool crashes trackerd when you
 supply a search term that does not exist in the index?

Test that and with a word similar to another it suggests something
(which shows results when I click on it) and with something completely
incomprehensible it just says it couldn't find anything. I don't get any
crashes.

I have asked Phillip to do the same too to make sure it isn't something
I am doing. I have a feeling it might be the content you are indexing.

Does it crash for you if you index a small selection of files? If you
could try a couple of places with only a few files in a directory that
would really help us identify if it was content based or not.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-12 Thread Jamie McCracken
On Fri, 2008-09-12 at 10:27 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  On Wed, 2008-09-10 at 23:20 -0400, Jamie McCracken wrote:
  Im afraid Im unable to run latest svn
 
  trackerd dies everytime I try and search for somehting
 
  I did following:
 
  svn up
  make distclean
  make
  sudo make install
  sudo rm -rf /usr/bin/trackerd
  sudo rm -rf /usr/bin/tracker-indexer
  rm -rf ~/.cache/tracker
  rm-rf ~/.local/share/tracker
  trackerd -v 3 
 
  searching with tracker-search-tool crashes trackerd
  searching with tracker-search returns no result (when it should)
 
 I don't see this crash.
 
 Note: in the last 24 hrs, Mikael has fixed a couple of nasty issues
 which could improve your situation. Namely metadata date handling issues
 and MP3  JPEG extractor fixes. I also changed the extractor to use
 libtracker-common functions instead of the duplicated code it was using.
 
  trackerd output showing it continuously outputs :
  Tracker-Message: Indexed 105/425, module:'evolution', 07m 13s left, 02m
  22s elapsed
 
 Note: It will spit status messages out roughly every 10 seconds to keep
 the daemon and applet up to date. Even if nothing has happened.

I know but it just outputs the above continuously with no change in the
indexed count

and it does not flush to index which means I cant search for anything

 
  note I have evol email indexing disabled
 
 How have you disabled that?

tracker.cfg file in~/ .config/tracker- I set
IndexEvolutionEmails=false

Note that existing config options must be respected as otherwise
upgrading will be impossible for existing users


 
 I tried it with -d evolution and with the DisabledModules config option
 in the .cfg file. Both worked fine for me.
 
  can you verify it runs correctly with reindex when evo email indeixng is
  set to false?
 
 Yes, I have done that twice. I have no problems with the indexing or
 searching with the search tools either. I have tried this on my desktop
 and I have done this on the Nokia device too. Both work properly.
 
 Are you able to try on another machine?

nope 
 
  for me it appears the index never flushes as it constantly tries to
  index evo stuff but the indexer rejects it
 
 Hmm, if you run the daemon with -v 3 it should say when it gets to the
 evolution module if it is disabled or not.

it does not say anything about that

 
 You don't have the evolution mail directory in your WatchDirectoryRoots
 do you?

no

 
  also can you confirm if tracker-search-tool crashes trackerd when you
  supply a search term that does not exist in the index?
 
 Test that and with a word similar to another it suggests something
 (which shows results when I click on it) and with something completely
 incomprehensible it just says it couldn't find anything. I don't get any
 crashes.
 
 I have asked Phillip to do the same too to make sure it isn't something
 I am doing. I have a feeling it might be the content you are indexing.
 
 Does it crash for you if you index a small selection of files? If you
 could try a couple of places with only a few files in a directory that
 would really help us identify if it was content based or not.

will play some more but index file size indicate sits empty so no flushing has 
occurred

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-12 Thread Martyn Russell
Jamie McCracken wrote:
 On Fri, 2008-09-12 at 10:27 +0100, Martyn Russell wrote:
 I know but it just outputs the above continuously with no change in the
 indexed count
 
 and it does not flush to index which means I cant search for anything

Flushing doesn't happen when the status message is printed. It is done
once a minute as I recall. Also, you can't use the index until the index
is closed by the indexer (which is usually when it finishes OR I *think*
when a request comes in to the daemon).

 note I have evol email indexing disabled
 How have you disabled that?
 
 tracker.cfg file in~/ .config/tracker- I set
 IndexEvolutionEmails=false
 
 Note that existing config options must be respected as otherwise
 upgrading will be impossible for existing users

This option is not honoured then. I do it one of 2 ways right now. Either:

1. trackerd -d evolution
2. DisabledModules=evolution;

The IndexEvolutionEmails option must have been overlooked.
Can I ask, have you tried removing your config file too?

I will look into working on an upgrade path to fix this on Monday.

 Are you able to try on another machine?
 
 nope 

Well, I have tried in 2 different locations and Phillip has tried to
reproduce your issues too. It works for us :/

 also can you confirm if tracker-search-tool crashes trackerd when you
 supply a search term that does not exist in the index?
 Test that and with a word similar to another it suggests something
 (which shows results when I click on it) and with something completely
 incomprehensible it just says it couldn't find anything. I don't get any
 crashes.

 I have asked Phillip to do the same too to make sure it isn't something
 I am doing. I have a feeling it might be the content you are indexing.

 Does it crash for you if you index a small selection of files? If you
 could try a couple of places with only a few files in a directory that
 would really help us identify if it was content based or not.
 
 will play some more but index file size indicate sits empty so no flushing 
 has occurred

Email is a special case, it isn't a 1 file = 1 index increment because
there are parts of emails which can be considered unique units to index.
Other than that I have noticed this and we can improve on it.

Which file are you checking for indexed words?

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-12 Thread Jamie McCracken
On Fri, 2008-09-12 at 18:35 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  On Fri, 2008-09-12 at 10:27 +0100, Martyn Russell wrote:
  I know but it just outputs the above continuously with no change in the
  indexed count
  
  and it does not flush to index which means I cant search for anything
 
 Flushing doesn't happen when the status message is printed. It is done
 once a minute as I recall. Also, you can't use the index until the index
 is closed by the indexer (which is usually when it finishes OR I *think*
 when a request comes in to the daemon).
 
  note I have evol email indexing disabled
  How have you disabled that?
  
  tracker.cfg file in~/ .config/tracker- I set
  IndexEvolutionEmails=false
  
  Note that existing config options must be respected as otherwise
  upgrading will be impossible for existing users
 
 This option is not honoured then. I do it one of 2 ways right now. Either:
 
 1. trackerd -d evolution
 2. DisabledModules=evolution;
 
 The IndexEvolutionEmails option must have been overlooked.
 Can I ask, have you tried removing your config file too?
 

no because it needs to work with existing settings

 I will look into working on an upgrade path to fix this on Monday.

ok - I will be travelling monday to UK - I will try and check again on
tuesday


 
  Are you able to try on another machine?
  
  nope 
 
 Well, I have tried in 2 different locations and Phillip has tried to
 reproduce your issues too. It works for us :/
 
  also can you confirm if tracker-search-tool crashes trackerd when you
  supply a search term that does not exist in the index?
  Test that and with a word similar to another it suggests something
  (which shows results when I click on it) and with something completely
  incomprehensible it just says it couldn't find anything. I don't get any
  crashes.
 
  I have asked Phillip to do the same too to make sure it isn't something
  I am doing. I have a feeling it might be the content you are indexing.
 
  Does it crash for you if you index a small selection of files? If you
  could try a couple of places with only a few files in a directory that
  would really help us identify if it was content based or not.
  
  will play some more but index file size indicate sits empty so no flushing 
  has occurred
 
 Email is a special case, it isn't a 1 file = 1 index increment because
 there are parts of emails which can be considered unique units to index.
 Other than that I have noticed this and we can improve on it.
 
 Which file are you checking for indexed words?

all of them - i just my name jamie to search as it should match all
files as they are in path /home/jamie


jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-11 Thread Martyn Russell
Jamie McCracken wrote:
 Im afraid Im unable to run latest svn
 
 trackerd dies everytime I try and search for somehting
 
 I did following:
 
 svn up
 make distclean

I would use:

  make maintainer-clean

 make
 sudo make install

Make sure you are not running trackerd or tracker-indexer first.

Before doing any installing or even the make maintainer-clean I would:

  sudo make uninstall
  sudo rm -Rf /usr/bin/tracker*
  sudo rm -Rf /usr/libexec/tracker*
  sudo rm -Rf /usr/lib/tracker/
  sudo rm -Rf /usr/share/tracker/
  rm -Rf ~/.cache/tracker
  rm -Rf ~/.local/share/tracker
  rm -Rf ~/.config/tracker

I would also do:

  find /usr -name '*tracker*'

And make sure EVERYTHING is removed first. Then when you autogen again, use:

  CFLAGS=-g -O0 ./autogen.sh --prefix=/usr --localstatedir=/var
--sysconfdir=/etc

(the CFLAGS for in case you need to valgrind or gdb it)

BUT even better, completely re-check out the the branch to be sure.

 sudo rm -rf /usr/bin/trackerd
 sudo rm -rf /usr/bin/tracker-indexer

There is also the thumbnailer which is no longer installed in /usr/bin.

 rm -rf ~/.cache/tracker
 rm-rf ~/.local/share/tracker

As above, don't forget the config.

 trackerd -v 3 

I would make sure you run a specific version instead of just letting the
path find trackerd, i.e. /usr/libexec/trackerd.

 searching with tracker-search-tool crashes trackerd
 searching with tracker-search returns no result (when it should)

I have freshly installed the packages on the Maemo device this week and
the tracker-search works fine (of course the TST doesn't).


 trackerd output showing it continuously outputs :
 Tracker-Message: Indexed 105/425, module:'evolution', 07m 13s left, 02m
 22s elapsed
 note I have evol email indexing disabled

I will check this.
I know it happens,
I just need to fix it.
I found indexing today to be incredibly slow, so I was just
investigating that.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-11 Thread Martyn Russell
Martyn Russell wrote:
 Jamie McCracken wrote:
 searching with tracker-search-tool crashes trackerd
 searching with tracker-search returns no result (when it should)
 
 I have freshly installed the packages on the Maemo device this week and
 the tracker-search works fine (of course the TST doesn't).

And it works - I meant to say.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-10 Thread Jamie McCracken

Im afraid Im unable to run latest svn

trackerd dies everytime I try and search for somehting

I did following:

svn up
make distclean
make
sudo make install
sudo rm -rf /usr/bin/trackerd
sudo rm -rf /usr/bin/tracker-indexer
rm -rf ~/.cache/tracker
rm-rf ~/.local/share/tracker
trackerd -v 3 

searching with tracker-search-tool crashes trackerd
searching with tracker-search returns no result (when it should)

trackerd output showing it continuously outputs :
Tracker-Message: Indexed 105/425, module:'evolution', 07m 13s left, 02m
22s elapsed

note I have evol email indexing disabled

jamie


 Jamie McCracken wrote:
  On Tue, 2008-09-09 at 18:02 +0200, Philip Van Hoof wrote:
  On Tue, 2008-09-09 at 11:56 -0400, Jamie McCracken wrote:
  On Tue, 2008-09-09 at 16:30 +0100, Martyn Russell wrote:
  Jamie can we try to get this merge done this week please
  sure - just need to check your additions which I will try to do
  tonight
  
  obviously post merge you will have to submit major patches to me
  so best to get as much in as possible before that
  It might make sense for us to continue bleeding edge development in
  the branch and after (your) intensive review merge that diff to
  trunk.
  
  Not sure how others feel about this?
  
  I dont mind if you want to sync what you have so far with trunk and
  then continue on indexer-split
  
  What you have so far is not ready for a release so it does not make
  much difference whether we merge or not at this point. Obviously if
  martyn and co feel strongly about merging then I will oblige. I have
  full confidence in you guys and am happy to prove it :)
 
 Phillip is right. From day to day when we are all working (and not on
 vacation like now) we tend to range on average from 5 to 20 commits a
 day. That's quite a lot of review work for you. There are 5 of us
 working on Tracker right now and it has been that way for a number of
 months. Unless you want more email of course :P
 
 Any thing major would be discussed with you first before it was
 implemented and potentially done in a separate branch anyway.
 
 The only thing I would say is that creating a diff for TRUNK is probably
 not a good idea. it is probably best to just copy everything right over
 after doing a pre-merge tag. There are a LOT of file differences and
 created/deleted files too.
 
 If you want me to do the merge, I can, just let me know.
 

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-10 Thread Jamie McCracken
On Wed, 2008-09-10 at 23:20 -0400, Jamie McCracken wrote:
 Im afraid Im unable to run latest svn
 
 trackerd dies everytime I try and search for somehting
 
 I did following:
 
 svn up
 make distclean
 make
 sudo make install
 sudo rm -rf /usr/bin/trackerd
 sudo rm -rf /usr/bin/tracker-indexer
 rm -rf ~/.cache/tracker
 rm-rf ~/.local/share/tracker
 trackerd -v 3 
 
 searching with tracker-search-tool crashes trackerd
 searching with tracker-search returns no result (when it should)
 
 trackerd output showing it continuously outputs :
 Tracker-Message: Indexed 105/425, module:'evolution', 07m 13s left, 02m
 22s elapsed
 
 note I have evol email indexing disabled
 
 jamie

can you verify it runs correctly with reindex when evo email indeixng is
set to false?

for me it appears the index never flushes as it constantly tries to
index evo stuff but the indexer rejects it 

also can you confirm if tracker-search-tool crashes trackerd when you
supply a search term that does not exist in the index?

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-09 Thread Martyn Russell
Martyn Russell wrote:
 Martyn Russell wrote:
 Hi,

 So I have been reading up on the things that are remaining for merging.
 This is the list I have so far which I will be working on:

 * Check the move files/directories issue. I *think* it works.
 
 Work on this will be continuing Monday.

This should be fixed now. The to and from strings were simply the wrong
way round. The code has also been improved here to use the file event
queue too which means we can make full use of the state machine also.

 * Make private libraries .so files to dynamically load them.
 
 I have made libtracker-common, libtracker-db and libstemmer all .so files.
 
 But I think what you meant was to make each language a .so which we
 dlopen() using something like GModule, right? That we can possibly do
 next week. I don't think this should stop the merge to be honest. There
 are bigger problems to address first.

I will start on this tomorrow. But we don't need it for the merge.

 * The directory mtime issue on startup.

This is fixed.

 Have I missed anything?
 
 New items:
 
 * Check mtime for summary files too.

I have yet to do this. I spoke briefly to Carlos about it. His opinion
was that it isn't necessary. I agree that it certainly isn't necessary
for the merge. I can look into this some time after the .so issue.

Jamie can we try to get this merge done this week please?

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-09 Thread Jamie McCracken
On Tue, 2008-09-09 at 16:30 +0100, Martyn Russell wrote:
 Martyn Russell wrote:
  Martyn Russell wrote:
  Hi,
 
  So I have been reading up on the things that are remaining for merging.
  This is the list I have so far which I will be working on:
 
  * Check the move files/directories issue. I *think* it works.
  
  Work on this will be continuing Monday.
 
 This should be fixed now. The to and from strings were simply the wrong
 way round. The code has also been improved here to use the file event
 queue too which means we can make full use of the state machine also.
 
  * Make private libraries .so files to dynamically load them.
  
  I have made libtracker-common, libtracker-db and libstemmer all .so files.
  
  But I think what you meant was to make each language a .so which we
  dlopen() using something like GModule, right? That we can possibly do
  next week. I don't think this should stop the merge to be honest. There
  are bigger problems to address first.
 
 I will start on this tomorrow. But we don't need it for the merge.
 
  * The directory mtime issue on startup.
 
 This is fixed.
 
  Have I missed anything?
  
  New items:
  
  * Check mtime for summary files too.
 
 I have yet to do this. I spoke briefly to Carlos about it. His opinion
 was that it isn't necessary. I agree that it certainly isn't necessary
 for the merge. I can look into this some time after the .so issue.
 
 Jamie can we try to get this merge done this week please

sure - just need to check your additions which I will try to do tonight

obviously post merge you will have to submit major patches to me so best to get 
as much in as possible before that

jamie







___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-09 Thread Philip Van Hoof
On Tue, 2008-09-09 at 11:56 -0400, Jamie McCracken wrote:
 On Tue, 2008-09-09 at 16:30 +0100, Martyn Russell wrote:

  
  Jamie can we try to get this merge done this week please
 
 sure - just need to check your additions which I will try to do tonight
 
 obviously post merge you will have to submit major patches to me so best to 
 get as much in as possible before that

It might make sense for us to continue bleeding edge development in the
branch and after (your) intensive review merge that diff to trunk.

Not sure how others feel about this?

The thing is that our team is changing things quite fast. For our team
we need a shared repository anyway. So either it would be a branch, or
we'd have our own private team repository (git-like, or indeed a git one
- which might make a lot of sense -).

Obviously we prefer to do things at the upstream project asap. Private
repositories are not cool in at least my opinion. But having to wait for
individual approvals of each and every patch ... would also block our
methodology a little bit.

Anyway ... my proposal for post-merge is to discuss this together at the
Maemo Desktop Search hackfest in Berlin.


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-09 Thread Jamie McCracken
On Tue, 2008-09-09 at 18:02 +0200, Philip Van Hoof wrote:
 On Tue, 2008-09-09 at 11:56 -0400, Jamie McCracken wrote:
  On Tue, 2008-09-09 at 16:30 +0100, Martyn Russell wrote:
 
   
   Jamie can we try to get this merge done this week please
  
  sure - just need to check your additions which I will try to do tonight
  
  obviously post merge you will have to submit major patches to me so best to 
  get as much in as possible before that
 
 It might make sense for us to continue bleeding edge development in the
 branch and after (your) intensive review merge that diff to trunk.
 
 Not sure how others feel about this?

I dont mind if you want to sync what you have so far with trunk and then
continue on indexer-split

What you have so far is not ready for a release so it does not make much
difference whether we merge or not at this point. Obviously if martyn
and co feel strongly about merging then I will oblige. I have full
confidence in you guys and am happy to prove it :)

jamie


___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-09 Thread Martyn Russell
Jamie McCracken wrote:
 On Tue, 2008-09-09 at 18:02 +0200, Philip Van Hoof wrote:
 On Tue, 2008-09-09 at 11:56 -0400, Jamie McCracken wrote:
 On Tue, 2008-09-09 at 16:30 +0100, Martyn Russell wrote:
 Jamie can we try to get this merge done this week please
 sure - just need to check your additions which I will try to do
 tonight
 
 obviously post merge you will have to submit major patches to me
 so best to get as much in as possible before that
 It might make sense for us to continue bleeding edge development in
 the branch and after (your) intensive review merge that diff to
 trunk.
 
 Not sure how others feel about this?
 
 I dont mind if you want to sync what you have so far with trunk and
 then continue on indexer-split
 
 What you have so far is not ready for a release so it does not make
 much difference whether we merge or not at this point. Obviously if
 martyn and co feel strongly about merging then I will oblige. I have
 full confidence in you guys and am happy to prove it :)

Phillip is right. From day to day when we are all working (and not on
vacation like now) we tend to range on average from 5 to 20 commits a
day. That's quite a lot of review work for you. There are 5 of us
working on Tracker right now and it has been that way for a number of
months. Unless you want more email of course :P

Any thing major would be discussed with you first before it was
implemented and potentially done in a separate branch anyway.

The only thing I would say is that creating a diff for TRUNK is probably
not a good idea. it is probably best to just copy everything right over
after doing a pre-merge tag. There are a LOT of file differences and
created/deleted files too.

If you want me to do the merge, I can, just let me know.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-05 Thread Martyn Russell
Martyn Russell wrote:
 Hi,
 
 So I have been reading up on the things that are remaining for merging.
 This is the list I have so far which I will be working on:
 
 * Check the move files/directories issue. I *think* it works.

Work on this will be continuing Monday.

 * Fix the get_file_contents() function so it checks for #13 in the first
 64Kb.

This should be fixed now. I actually took a different approach on this.
The code has a #define at the top to switch the behaviour here between

* Validating up to the most valid UTF-8 character (default).

* Checking for valid UTF-8, if not, trying to convert from locale and if
unsuccessful dropping the file (behaviour in TRUNK).

 * Make private libraries .so files to dynamically load them.

I have made libtracker-common, libtracker-db and libstemmer all .so files.

But I think what you meant was to make each language a .so which we
dlopen() using something like GModule, right? That we can possibly do
next week. I don't think this should stop the merge to be honest. There
are bigger problems to address first.

 * The directory mtime issue on startup.

Work on this will be continuing on Monday.

 Have I missed anything?

New items:

* Check mtime for summary files too.

 Im also adding my tracker-fts stuff into that branch so will likely
 merge when above + my stuff is ready

Yea. Can you make sure your code compiles without warnings before
committing in the future if possible :)

With the new code additions, make distcheck fails and building packages
is impossible. For now, I made your code optional (disabled by default).
To compile it, you can use --enable-sqlite-fts with configure.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-05 Thread Jamie McCracken
On Fri, 2008-09-05 at 13:19 +0100, Martyn Russell wrote:
 Martyn Russell wrote:
  Hi,
  
  So I have been reading up on the things that are remaining for merging.
  This is the list I have so far which I will be working on:
  
  * Check the move files/directories issue. I *think* it works.
 
 Work on this will be continuing Monday.
 
  * Fix the get_file_contents() function so it checks for #13 in the first
  64Kb.
 
 This should be fixed now. I actually took a different approach on this.
 The code has a #define at the top to switch the behaviour here between
 
 * Validating up to the most valid UTF-8 character (default).
 
 * Checking for valid UTF-8, if not, trying to convert from locale and if
 unsuccessful dropping the file (behaviour in TRUNK).
 
  * Make private libraries .so files to dynamically load them.
 
 I have made libtracker-common, libtracker-db and libstemmer all .so files.
 
 But I think what you meant was to make each language a .so which we
 dlopen() using something like GModule, right? That we can possibly do
 next week. I don't think this should stop the merge to be honest. There
 are bigger problems to address first.
 
  * The directory mtime issue on startup.
 
 Work on this will be continuing on Monday.
 
  Have I missed anything?
 
 New items:
 
 * Check mtime for summary files too.
 
  Im also adding my tracker-fts stuff into that branch so will likely
  merge when above + my stuff is ready
 
 Yea. Can you make sure your code compiles without warnings before
 committing in the future if possible :)

most of those warning are in the original sqlite source - will try and
fix

thanks for your efforts

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-03 Thread Martyn Russell
Jamie McCracken wrote:
 On Tue, 2008-09-02 at 12:23 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
 Could we also reduce memory usage by not statically linking to the
 private libs libtracker-common and libtracker-db?
 Those libraries should not be available for public use. Before doing so,
 each API would have to be:

 a) Documented
 b) Checked it needs to be public
 c) Versioned
 d) ...

 This is a lot of work and I don't think it is worth it.
 I haven't looked at the footprints myself though.
 
 
 why we would do all that?
 
 we would not be exporting the headers for those libs so no other apps
 outside of tracker source tree will be able to use it effectively
 
 surely there are some examples of private libs that are not statically
 linked?

I mis-understood clearly. I thought you meant make it public for public
use. I think making them .so libs but privately used is a good idea.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-03 Thread Martyn Russell
Jamie McCracken wrote:
 On Tue, 2008-09-02 at 12:23 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
 Could we also reduce memory usage by not statically linking to the
 private libs libtracker-common and libtracker-db?
 Those libraries should not be available for public use. Before doing so,
 each API would have to be:

 a) Documented
 b) Checked it needs to be public
 c) Versioned
 d) ...

 This is a lot of work and I don't think it is worth it.
 I haven't looked at the footprints myself though.

 currently my FTS module and the file-indexer-module are ~ 1MB in size
 due mostly to linking with them and im sure the size of trackerd and
 tracker-indexer could be made smaller too with only one instance of
 those libs in memory
 How does the memory footprint compare to the old tracker?

 
 having looked at the contents of libtracker-common, most of the memory
 used is for the stemmers - we load them all into memory even though we
 only use one of them. i think making each language stemmer a dynamically
 loaded module should help reduce things

I can look into doing this.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-03 Thread Martyn Russell
Jamie McCracken wrote:
 trunk only checks directories (If a file in a directory is modified then
 the directories mtime is also altered so no need to check every file)
 hence startup is much faster.

Note: the mtime of the parent directory ONLY is updated. This is not
recursive. So if you have /foo/bar/baz/sliff.txt, the mtime of baz/ is
updated not for bar/ and foo/.

This means you _HAVE_ to go into every directory to see if it has a
subdirectory with an mtime that has updated.

 We can do this. Can you guarantee that on EVERY file system type the
 parent directory mtime is updated when a file changes? I am not 100%
 sure this is the case.
 
 on all major platforms yes (*nix and windows)

Hmm. This wories me. How mtime is used across file systems tends to vary
slightly and this might come back to bite us.

 it is for me - its in the order of 3x slower than trunk at startup 

What exactly is 3x slower? The crawling?

I have been thinking about this. The best solution here to me is to send
ALL files/directories to the indexer and let the indexer check the mtime
of a directories before deciding to process the files it holds. This
should dramatically reduce the DB lookups on startup. But if the
slowness is NOT in the indexer, then there is little you can do except
increase the throttle. Have you tested it again recently since I made
throttle mandatory whenever it is called (i.e. it is 5+config value).
This made a lot of difference for me.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-03 Thread Martyn Russell
Hi,

So I have been reading up on the things that are remaining for merging.
This is the list I have so far which I will be working on:

* Check the move files/directories issue. I *think* it works.

* Fix the get_file_contents() function so it checks for #13 in the first
64Kb.

* Make private libraries .so files to dynamically load them.

* The directory mtime issue on startup.

Have I missed anything?

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-03 Thread Jamie McCracken
On Wed, 2008-09-03 at 12:34 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  trunk only checks directories (If a file in a directory is modified then
  the directories mtime is also altered so no need to check every file)
  hence startup is much faster.
 
 Note: the mtime of the parent directory ONLY is updated. This is not
 recursive. So if you have /foo/bar/baz/sliff.txt, the mtime of baz/ is
 updated not for bar/ and foo/.
 
 This means you _HAVE_ to go into every directory to see if it has a
 subdirectory with an mtime that has updated.

that is what trunk does - it only checks directories (and
subdirectories). Theres no need to check mtime for a file ever unless
the parent directory mtime has changed

 
  We can do this. Can you guarantee that on EVERY file system type the
  parent directory mtime is updated when a file changes? I am not 100%
  sure this is the case.
  
  on all major platforms yes (*nix and windows)
 
 Hmm. This wories me. How mtime is used across file systems tends to vary
 slightly and this might come back to bite us.


Its not been a problem in the past for tracker and certainly wont be for
our target audience

 
  it is for me - its in the order of 3x slower than trunk at startup 
 
 What exactly is 3x slower? The crawling?
 
 I have been thinking about this. The best solution here to me is to send
 ALL files/directories to the indexer and let the indexer check the mtime
 of a directories before deciding to process the files it holds. This
 should dramatically reduce the DB lookups on startup. But if the
 slowness is NOT in the indexer, then there is little you can do except
 increase the throttle. Have you tested it again recently since I made
 throttle mandatory whenever it is called (i.e. it is 5+config value).
 This made a lot of difference for me.
 


trackerd should just pass directories at startup and let the indexer
work out what to process. Dbus is not optimised for passing large number
of strings. Can the current design easily accommodate this?


jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-03 Thread Jamie McCracken
On Wed, 2008-09-03 at 12:34 +0100, Martyn Russell wrote:
 Hi,
 
 So I have been reading up on the things that are remaining for merging.
 This is the list I have so far which I will be working on:
 
 * Check the move files/directories issue. I *think* it works.

check the new directory name can be searched when doing a rename

also check the new name is searchable against all items in that
directory

 
 * Fix the get_file_contents() function so it checks for #13 in the first
 64Kb.
 
 * Make private libraries .so files to dynamically load them. 
Also for stemmer - make them dynamically loadable too
 
 * The directory mtime issue on startup.
 

also for summary files too - only check em if mtime has changed

 Have I missed anything?

I think that is it. A lot of Prefs dont work but that can wait til after
merge.

Im also adding my tracker-fts stuff into that branch so will likely
merge when above + my stuff is ready

jamie



___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-03 Thread Martyn Russell
Jamie McCracken wrote:
 trackerd should just pass directories at startup and let the indexer
 work out what to process. Dbus is not optimised for passing large number
 of strings. Can the current design easily accommodate this?

DBus' optimisation is not an issue here. I can send ALL of my files over
quicker than the indexer can mtime check ALL the directories in the
database.

Yes we can accommodate this. We simply send all files/directories to the
indexer and the indexer can check each parent directory first then
process the files or discard them if the parent directory mtime is up to
date.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-03 Thread Jamie McCracken
On Wed, 2008-09-03 at 10:32 -0400, Jamie McCracken wrote:
 On Wed, 2008-09-03 at 12:34 +0100, Martyn Russell wrote:
  Hi,
  
  So I have been reading up on the things that are remaining for merging.
  This is the list I have so far which I will be working on:
  
  * Check the move files/directories issue. I *think* it works.
 
 check the new directory name can be searched when doing a rename
 
 also check the new name is searchable against all items in that
 directory
 
  
  * Fix the get_file_contents() function so it checks for #13 in the first
  64Kb.

also do what trunk does and validate each line. If it fails utf-8
validation attempt to convert from locale. Best to exit with null if any
part fails. I assume the gio stuff handles non utf-8?

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-03 Thread Philip Van Hoof
On Wed, 2008-09-03 at 15:31 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  trackerd should just pass directories at startup and let the indexer
  work out what to process. Dbus is not optimised for passing large number
  of strings. Can the current design easily accommodate this?
 
 DBus' optimisation is not an issue here. I can send ALL of my files over
 quicker than the indexer can mtime check ALL the directories in the
 database.

DBus only starts to perform bad as soon as message size grows over 4 kb
in size. In 4kb you can put quite a lot of uris.

Therefore I don't think we should focus on reducing the amount of uris
we send from the daemon to the indexer.

 Yes we can accommodate this. We simply send all files/directories to the
 indexer and the indexer can check each parent directory first then
 process the files or discard them if the parent directory mtime is up to
 date.


-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-03 Thread Jamie McCracken
On Wed, 2008-09-03 at 16:35 +0200, Philip Van Hoof wrote:
 On Wed, 2008-09-03 at 15:31 +0100, Martyn Russell wrote:
  Jamie McCracken wrote:
   trackerd should just pass directories at startup and let the indexer
   work out what to process. Dbus is not optimised for passing large number
   of strings. Can the current design easily accommodate this?
  
  DBus' optimisation is not an issue here. I can send ALL of my files over
  quicker than the indexer can mtime check ALL the directories in the
  database.
 
 DBus only starts to perform bad as soon as message size grows over 4 kb
 in size. In 4kb you can put quite a lot of uris.
 
 Therefore I don't think we should focus on reducing the amount of uris
 we send from the daemon to the indexer.
 

ok but lets see how it performs first

I want startup of a previously indexed machine to be as good or close to
trunk

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-02 Thread Martyn Russell
Jamie McCracken wrote:
 Could we also reduce memory usage by not statically linking to the
 private libs libtracker-common and libtracker-db?

Those libraries should not be available for public use. Before doing so,
each API would have to be:

a) Documented
b) Checked it needs to be public
c) Versioned
d) ...

This is a lot of work and I don't think it is worth it.
I haven't looked at the footprints myself though.

 currently my FTS module and the file-indexer-module are ~ 1MB in size
 due mostly to linking with them and im sure the size of trackerd and
 tracker-indexer could be made smaller too with only one instance of
 those libs in memory

How does the memory footprint compare to the old tracker?

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-02 Thread Jamie McCracken
On Tue, 2008-09-02 at 12:17 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  Finding more performance issues
  
  on an up to date indexed home directory, the next restart of trackerd
  checks every single file to see if its up to date - why?
 
 Because we redesigned the whole code base and haven't finished
 optimising it yet.

fair enough


 
  trunk only checks directories (If a file in a directory is modified then
  the directories mtime is also altered so no need to check every file)
  hence startup is much faster.
 
 We can do this. Can you guarantee that on EVERY file system type the
 parent directory mtime is updated when a file changes? I am not 100%
 sure this is the case.

on all major platforms yes (*nix and windows)


 
  This needs to be restored as the performance of indexer-split is
  horrendous at startup
 
 It isn't that bad.

it is for me - its in the order of 3x slower than trunk at startup 


jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-09-02 Thread Jamie McCracken
On Tue, 2008-09-02 at 12:23 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  Could we also reduce memory usage by not statically linking to the
  private libs libtracker-common and libtracker-db?
 
 Those libraries should not be available for public use. Before doing so,
 each API would have to be:
 
 a) Documented
 b) Checked it needs to be public
 c) Versioned
 d) ...
 
 This is a lot of work and I don't think it is worth it.
 I haven't looked at the footprints myself though.


why we would do all that?

we would not be exporting the headers for those libs so no other apps
outside of tracker source tree will be able to use it effectively

surely there are some examples of private libs that are not statically
linked?

 
  currently my FTS module and the file-indexer-module are ~ 1MB in size
  due mostly to linking with them and im sure the size of trackerd and
  tracker-indexer could be made smaller too with only one instance of
  those libs in memory
 
 How does the memory footprint compare to the old tracker?
 

resident memory is a lot steeper and thats even before its started
indexing

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-23 Thread Jamie McCracken
Finding more performance issues

on an up to date indexed home directory, the next restart of trackerd
checks every single file to see if its up to date - why?

trunk only checks directories (If a file in a directory is modified then
the directories mtime is also altered so no need to check every file)
hence startup is much faster.

This needs to be restored as the performance of indexer-split is
horrendous at startup

ps same with summary files - only need to check if mtime is differs in
tracker's database

jamie


On Fri, 2008-08-22 at 15:11 +0200, Philip Van Hoof wrote:
 On Fri, 2008-08-22 at 11:42 +0100, Martyn Russell wrote:
  Jamie McCracken wrote:
   
  
   also search is still blocked for 10-20 seconds even when indexer is not
   active - why does it take so long to pause the indexer? Its completley
   unusable like that. the indexer must pause in under a second.
  
  We fixed that bug yesterday. I am not sure if it was the double free you
  added to your commit ;) or the fix Carlos added where the index was
  being reopened.
 
 (I think) It was the commit of the transaction in the indexer, when the
 indexer is asked to pause.
 
 cd branches/indexer-split
 svn diff -r 2134:2135
 

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-22 Thread Martyn Russell
Jamie McCracken wrote:
 still getting lots of serious problems when running
 
 
 email-contents.db does not grow - why are we not saving email body
 contents here? 
 
 Also what happened to email-index.db? Why have you combined files and
 emails index? 

Not sure why this was decided, but I think amongst the refactoring the
old code wasn't reinstated so we got rid of what was left over (of the
email-index.db types, etc).

Anyway, this has been re-added.

 remember the bigger the index the slower it is to update which is why we
 have separate indexes for emails and files (should be file-index.db and
 email-index.db)

Yep. The fix seems to elevate this significantly.

 it also breaks tracker-search-tool as it tries to show emails category
 twice

I am not sure that's the reason it does that actually, but still.

 [EMAIL PROTECTED]:~/.cache/tracker$ ls -l
 total 152428
 -rw-r--r-- 1 jamie jamie12288 2008-08-20 23:41 email-contents.db
 -rw-r--r-- 1 jamie jamie 13492224 2008-08-20 23:51 email-meta.db
 -rw-r--r-- 1 jamie jamie   107216 2008-08-20 23:51 email-meta.db-journal
 -rw-r--r-- 1 jamie jamie 42377216 2008-08-20 23:50 file-contents.db
 -rw-r--r-- 1 jamie jamie 17080320 2008-08-20 23:50 file-meta.db
 -rw-r--r-- 1 jamie jamie 80308976 2008-08-20 23:51 index.db
 -rw-r--r-- 1 jamie jamie  2359292 2008-08-20 23:41 index-update.db
 -rw-r--r-- 1 jamie jamie   151552 2008-08-20 23:41 xesam.db

Now we have email-index.db and file-index.db:

[EMAIL PROTECTED]:~$ ls -l /home/martyn/.cache/tracker/
total 680604
-rw-r--r-- 1 martyn martyn729088 2008-08-22 11:39 email-contents.db
-rw-r--r-- 1 martyn martyn  74114696 2008-08-22 11:42 email-index.db
-rw-r--r-- 1 martyn martyn114688 2008-08-22 11:39 email-meta.db
-rw-r--r-- 1 martyn martyn 177500160 2008-08-22 11:29 file-contents.db
-rw-r--r-- 1 martyn martyn 387498224 2008-08-22 11:30 file-index.db
-rw-r--r-- 1 martyn martyn   2359292 2008-08-22 10:59 file-index-update.db
-rw-r--r-- 1 martyn martyn  54284288 2008-08-22 11:30 file-meta.db
-rw-r--r-- 1 martyn martyn151552 2008-08-22 10:59 xesam.db

 also after renaming a folder I could not search for the new name

I need to check this. I have not had time yet and I leave to go on
vacation tonight so I doubt I will be able to. I can leave this task to
Carlos, Ivan, Phillip and Mikael to look at.

 also search is still blocked for 10-20 seconds even when indexer is not
 active - why does it take so long to pause the indexer? Its completley
 unusable like that. the indexer must pause in under a second.

We fixed that bug yesterday. I am not sure if it was the double free you
added to your commit ;) or the fix Carlos added where the index was
being reopened.

Either way, it seems infinitely better now.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-22 Thread Philip Van Hoof
On Fri, 2008-08-22 at 11:42 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  
 
  also search is still blocked for 10-20 seconds even when indexer is not
  active - why does it take so long to pause the indexer? Its completley
  unusable like that. the indexer must pause in under a second.
 
 We fixed that bug yesterday. I am not sure if it was the double free you
 added to your commit ;) or the fix Carlos added where the index was
 being reopened.

(I think) It was the commit of the transaction in the indexer, when the
indexer is asked to pause.

cd branches/indexer-split
svn diff -r 2134:2135

-- 
Philip Van Hoof, freelance software developer
home: me at pvanhoof dot be 
gnome: pvanhoof at gnome dot org 
http://pvanhoof.be/blog
http://codeminded.be




___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-20 Thread Martyn Russell
Jamie McCracken wrote:
 1) trackerd: Handle file moves - update files in a directory
 recursively when a directory is renamed/moved (need to pause indexer
 before updating - watch out!). Likewise re-enable update of index from
 trackerd as its needed for tagging and other user metadata

This should be committed now. From my brief testing it seemed to work.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-20 Thread Jamie McCracken
On Wed, 2008-08-20 at 12:49 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  1) trackerd: Handle file moves - update files in a directory
  recursively when a directory is renamed/moved (need to pause indexer
  before updating - watch out!). Likewise re-enable update of index from
  trackerd as its needed for tagging and other user metadata
 
 This should be committed now. From my brief testing it seemed to work.
 

ok thanks

i will give it a spin tonight and providing nothing major is wrong we
can merge tomorrow

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-20 Thread Jamie McCracken
On Wed, 2008-08-20 at 12:49 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  1) trackerd: Handle file moves - update files in a directory
  recursively when a directory is renamed/moved (need to pause indexer
  before updating - watch out!). Likewise re-enable update of index from
  trackerd as its needed for tagging and other user metadata
 
 This should be committed now. From my brief testing it seemed to work.
 


still getting lots of serious problems when running


email-contents.db does not grow - why are we not saving email body
contents here? 

Also what happened to email-index.db? Why have you combined files and
emails index? 

remember the bigger the index the slower it is to update which is why we
have separate indexes for emails and files (should be file-index.db and
email-index.db)

it also breaks tracker-search-tool as it tries to show emails category
twice

[EMAIL PROTECTED]:~/.cache/tracker$ ls -l
total 152428
-rw-r--r-- 1 jamie jamie12288 2008-08-20 23:41 email-contents.db
-rw-r--r-- 1 jamie jamie 13492224 2008-08-20 23:51 email-meta.db
-rw-r--r-- 1 jamie jamie   107216 2008-08-20 23:51 email-meta.db-journal
-rw-r--r-- 1 jamie jamie 42377216 2008-08-20 23:50 file-contents.db
-rw-r--r-- 1 jamie jamie 17080320 2008-08-20 23:50 file-meta.db
-rw-r--r-- 1 jamie jamie 80308976 2008-08-20 23:51 index.db
-rw-r--r-- 1 jamie jamie  2359292 2008-08-20 23:41 index-update.db
-rw-r--r-- 1 jamie jamie   151552 2008-08-20 23:41 xesam.db



also after renaming a folder I could not search for the new name

also search is still blocked for 10-20 seconds even when indexer is not
active - why does it take so long to pause the indexer? Its completley
unusable like that. the indexer must pause in under a second.


Can you fix above please before merging (sorry I missed it before)

thanks

jamie



___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-18 Thread Martyn Russell
Jamie McCracken wrote:
 On Fri, 2008-08-15 at 11:01 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
 As I will be working quite extensively on trunk post merge I require any
 major changes to be done ASAP

 list of changes required prior to merge (in order of priority) - all of
 these already exist and work in trunk:


 1) trackerd: Handle file moves - update files in a directory
 recursively when a directory is renamed/moved (need to pause indexer
 before updating - watch out!). Likewise re-enable update of index from
 trackerd as its needed for tagging and other user metadata
 Hi,

 This is done in the indexer, the daemon, however, currently has no way
 of knowing about files linked by moves. Instead, GIO gives us DELETED
 and CREATED events.

 That is quite unacceptable I think.

 I have created a bug report about this:

   http://bugzilla.gnome.org/show_bug.cgi?id=547890

 We can in the mean time perhaps add some glue by checking the md5sum of
 the 2 files to see if they are the same and the events occur within the
 same 2 seconds perhaps? I would rather not do this. But it might be
 necessary for the time being.

 
 can we fork the gio monitor code and inline it into our source tree
 then?
 
 when updated glib with that functionality is available we can swap it
 out

Not really.

The GIO code isn't the easiest code to fork. I think doing that would
take longer than finding another, easier solution.

I am investigating this.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-18 Thread Jamie McCracken
On Mon, 2008-08-18 at 10:05 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  On Fri, 2008-08-15 at 11:01 +0100, Martyn Russell wrote:
  Jamie McCracken wrote:
  As I will be working quite extensively on trunk post merge I require any
  major changes to be done ASAP
 
  list of changes required prior to merge (in order of priority) - all of
  these already exist and work in trunk:
 
 
  1) trackerd: Handle file moves - update files in a directory
  recursively when a directory is renamed/moved (need to pause indexer
  before updating - watch out!). Likewise re-enable update of index from
  trackerd as its needed for tagging and other user metadata
  Hi,
 
  This is done in the indexer, the daemon, however, currently has no way
  of knowing about files linked by moves. Instead, GIO gives us DELETED
  and CREATED events.
 
  That is quite unacceptable I think.
 
  I have created a bug report about this:
 
http://bugzilla.gnome.org/show_bug.cgi?id=547890
 
  We can in the mean time perhaps add some glue by checking the md5sum of
  the 2 files to see if they are the same and the events occur within the
  same 2 seconds perhaps? I would rather not do this. But it might be
  necessary for the time being.
 
  
  can we fork the gio monitor code and inline it into our source tree
  then?
  
  when updated glib with that functionality is available we can swap it
  out
 
 Not really.
 
 The GIO code isn't the easiest code to fork. I think doing that would
 take longer than finding another, easier solution.
 
 I am investigating this.
 

I meant just the monitor code not the whole of GIO

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-17 Thread Jamie McCracken
On Fri, 2008-08-15 at 11:01 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  As I will be working quite extensively on trunk post merge I require any
  major changes to be done ASAP
  
  list of changes required prior to merge (in order of priority) - all of
  these already exist and work in trunk:
  
  
  1) trackerd: Handle file moves - update files in a directory
  recursively when a directory is renamed/moved (need to pause indexer
  before updating - watch out!). Likewise re-enable update of index from
  trackerd as its needed for tagging and other user metadata
 
 Hi,
 
 This is done in the indexer, the daemon, however, currently has no way
 of knowing about files linked by moves. Instead, GIO gives us DELETED
 and CREATED events.
 
 That is quite unacceptable I think.
 
 I have created a bug report about this:
 
   http://bugzilla.gnome.org/show_bug.cgi?id=547890
 
 We can in the mean time perhaps add some glue by checking the md5sum of
 the 2 files to see if they are the same and the events occur within the
 same 2 seconds perhaps? I would rather not do this. But it might be
 necessary for the time being.
 

can we fork the gio monitor code and inline it into our source tree
then?

when updated glib with that functionality is available we can swap it
out

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-15 Thread Martyn Russell
Jamie McCracken wrote:
 As I will be working quite extensively on trunk post merge I require any
 major changes to be done ASAP
 
 list of changes required prior to merge (in order of priority) - all of
 these already exist and work in trunk:
 
 
 1) trackerd: Handle file moves - update files in a directory
 recursively when a directory is renamed/moved (need to pause indexer
 before updating - watch out!). Likewise re-enable update of index from
 trackerd as its needed for tagging and other user metadata

Hi,

This is done in the indexer, the daemon, however, currently has no way
of knowing about files linked by moves. Instead, GIO gives us DELETED
and CREATED events.

That is quite unacceptable I think.

I have created a bug report about this:

  http://bugzilla.gnome.org/show_bug.cgi?id=547890

We can in the mean time perhaps add some glue by checking the md5sum of
the 2 files to see if they are the same and the events occur within the
same 2 seconds perhaps? I would rather not do this. But it might be
necessary for the time being.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-15 Thread Martyn Russell
Jamie McCracken wrote:
 On Wed, 2008-08-13 at 19:30 +0200, Carlos Garnacho wrote:
 As far as I see, for mbox you're storing the offset in the stream:

 msg_offset = g_mime_parser_tell (mf-parser);
 
 mail_msg-offset = msg_offset;
 
 For IMAP, I just get 0 in the Services table, also didn't get to see
 any code to do this.
 
 
 imap stores message count too - its count rather than byte offset

As Carlos says, this code is NOT working in TRUNK for IMAP. So this
whole argument is moot.

 no when junk/deleted email is encountered during the start up scan its
 UID is checked against that table  (JunkMails) to see if we already know
 about it. If its not in that table then we add it and then delete it
 from our index. Ergo its more efficient than what you have

The whole idea of keeping a separate table for deleted/junk email sounds
really inefficient to me. I have quite a bit and I get quite a bit every
day, that's a lot of extra processing. Surely it is MORE processing than
the current inefficiencies you are outlining with our current design?

 Could you tell me where's that code? The only users for
 InsertJunk/LookupJunk (the stored procedures) are
 tracker_db_email_insert_junk() and tracker_db_email_lookup_junk(), the
 former is also the only user of the latter, and it doesn't do what you
 mention.

 The only place I see where it could delete emails from the DB for
 Evolution is check_summary_file(), and tracker_db_email_delete_email()
 seems to be called inconditionally for any junk/deleted message found.
 
 the way it should work is as described above
 
 I had tested it and it works (deleted and junk emails are pruned on next
 restart of trackerd)

What you're saying here doesn't make a lot of sense to me. It sounds
like you're saying that if mail is marked as junk or deleted you don't
want to update the index until we restart the daemon? So people will
still be searching and finding junk until trackerd is restarted? That
doesn't sound right to me. Or did you mean something else?

 How do you currently tell which emails are new in the summary file?
 Without storing the count you cannot know without verifying each email
 exists in the services table (which would obviously be unacceptable
 performance wise)

You haven't answered the question. Where is the code?

 the trunk way is faster so i would prefer that restored

TRUNK doesn't work as you think it does.

 If you bear with me, I'd prefer to try a few optimizations before having
 to add special cases.
 well not doing the junk/deletion check everytime the summary file changes 
 must obviously be faster?

Plus Carlos is right, this code can probably be optimised much more than
it is now. It has just been written to get working so far.

 Sure, but it's also more beneficial for users if tracker DB contents are
 up to date with the actual data. Also, IMHO adding special cases like
 this would break a design that makes tracker really extensible and easy
 to develop for.

Carlos has spent a lot of time designing this.

I spoke further with him about it too, we could change the way we do
things now to use GTypeModule and GInterface to make it extensible, but
that will take a few days at least to do.

This issue in general is not a show stopper, it is a performance issue,
the performance issue we have with index.db (which you say you will fix
next week by using SQLite with FTS) is much more of an issue than this
by far. I would suggest we merge and resolve these on trunk so you can
get on Jamie.

 that can be done easily -  for quick synch test just check last known
 UID in summary file (using stored message count) exists in services - if
 it does not then you have a count mismatch and a resync is required

I don't claim to know much about the UID, but what if you receive a mail
and delete a mail - won't the count the be the same? Resulting in your
count check for a resync breaking?

 this can be done whenever a new email arrives as its not expensive
 
 suggest having a resync method to do above and a check_synch one to test
 its ok

We could have this soon, but it won't be today unfortunately. Carlos is
on vacation.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-15 Thread Jamie McCracken
On Fri, 2008-08-15 at 11:01 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  As I will be working quite extensively on trunk post merge I require any
  major changes to be done ASAP
  
  list of changes required prior to merge (in order of priority) - all of
  these already exist and work in trunk:
  
  
  1) trackerd: Handle file moves - update files in a directory
  recursively when a directory is renamed/moved (need to pause indexer
  before updating - watch out!). Likewise re-enable update of index from
  trackerd as its needed for tagging and other user metadata
 
 Hi,
 
 This is done in the indexer, the daemon, however, currently has no way
 of knowing about files linked by moves. Instead, GIO gives us DELETED
 and CREATED events.
 
 That is quite unacceptable I think.
 
 I have created a bug report about this:
 
   http://bugzilla.gnome.org/show_bug.cgi?id=547890
 
 We can in the mean time perhaps add some glue by checking the md5sum of
 the 2 files to see if they are the same and the events occur within the
 same 2 seconds perhaps? I would rather not do this. But it might be
 necessary for the time being.
 


why not use the native inotify and just use gio file monitoring for the
others?

when gio has the new functionality we can then replace inotify with it

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-13 Thread Carlos Garnacho
Hi!,

On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote:

snip

 that sounds inefficient - trunk only ever checked for existing deleted
 or junk emails at startup because iterating through all emails in the
 summary files is expensive. 

From what I've read in trunk code, you still iterate through all the
mails in the summary in check_summary_file(), and you will have to
iterate over them again later to index new messages, etc...

As far as I know, it's quite unavoidable to parse again summaries, since
under some circumstances Message IDs could be reused, which would leave
you with inconsistent data in the DBs. Even if it isn't, expunging a
folder would render any stored offset for the summary file useless (even
dangerous).

Besides, when testing summary parsing, I remember it was pretty fast
(like 2-3 seconds for a ~6500 emails summary), of course without
inserting to DBs nor doing message body or attachments sniffing, which
is more or less what should happen if the junk/deleted flag is set.

 the use of a separate junk email table meant
 lookups were confined to that table and not the services table so was
 faster when number of emails was high

You mean the JunkMails table in email-meta.db? As far as I see, this
table is just looked up to make sure there aren't duplicates when
inserting. And in the end, you still have to lookup/modify the Services
table, even if the junk mail wasn't there.

 
 we should also avoid doing this whenever the summary file changes which
 is why we stored an offset in trunk so we skip over messages to get to
 the new ones only when summary files change or do nothing if no new ones
 are present

As said above, I think there are pretty good reasons to avoid this.

 
 the trunk way is faster so i would prefer that restored

If you bear with me, I'd prefer to try a few optimizations before having
to add special cases.

Regards,
   Carlos

 
 thanks
 
 jamie
 
-- 
Carlos Garnacho
Imendio AB - Expert solutions in GTK+
http://www.imendio.com

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-13 Thread Jamie McCracken
On Wed, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote:
 Hi!,
 
 On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote:
 
 snip
 
  that sounds inefficient - trunk only ever checked for existing deleted
  or junk emails at startup because iterating through all emails in the
  summary files is expensive. 
 
 From what I've read in trunk code, you still iterate through all the
 mails in the summary in check_summary_file(), and you will have to
 iterate over them again later to index new messages, etc...

yes but when we are not doing the startup check, we are skipping so its
faster and we are not stopping at any deleted or junk email and checking
it 


 
 As far as I know, it's quite unavoidable to parse again summaries, since
 under some circumstances Message IDs could be reused, which would leave
 you with inconsistent data in the DBs. Even if it isn't, expunging a
 folder would render any stored offset for the summary file useless (even
 dangerous).

true but we would get a deletion from inotify of the summary file if
that was the case. Its not a byte offset but message count - so we skip
x messages to get the new ones (similar to what beagle does)


 
 Besides, when testing summary parsing, I remember it was pretty fast
 (like 2-3 seconds for a ~6500 emails summary), of course without
 inserting to DBs nor doing message body or attachments sniffing, which
 is more or less what should happen if the junk/deleted flag is set.

with 100,000+ emails its quite noticeable


 
  the use of a separate junk email table meant
  lookups were confined to that table and not the services table so was
  faster when number of emails was high
 
 You mean the JunkMails table in email-meta.db? As far as I see, this
 table is just looked up to make sure there aren't duplicates when
 inserting. And in the end, you still have to lookup/modify the Services
 table, even if the junk mail wasn't there.
 

no when junk/deleted email is encountered during the start up scan its
UID is checked against that table  (JunkMails) to see if we already know
about it. If its not in that table then we add it and then delete it
from our index. Ergo its more efficient than what you have


  
  we should also avoid doing this whenever the summary file changes which
  is why we stored an offset in trunk so we skip over messages to get to
  the new ones only when summary files change or do nothing if no new ones
  are present
 
 As said above, I think there are pretty good reasons to avoid this.
 
  
  the trunk way is faster so i would prefer that restored
 
 If you bear with me, I'd prefer to try a few optimizations before having
 to add special cases.

well not doing the junk/deletion check everytime the summary file changes must 
obviously be faster?

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-13 Thread Martyn Russell
Jamie McCracken wrote:
 On Wed, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote:
 Hi!,

Hi :)

 On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote:

 snip

 that sounds inefficient - trunk only ever checked for existing deleted
 or junk emails at startup because iterating through all emails in the
 summary files is expensive. 
 From what I've read in trunk code, you still iterate through all the
 mails in the summary in check_summary_file(), and you will have to
 iterate over them again later to index new messages, etc...
 
 yes but when we are not doing the startup check, we are skipping so its
 faster and we are not stopping at any deleted or junk email and checking
 it 

Of course it is faster, but that doesn't mean we are completely
synchronised - unless I missed the point. If you had an email in the
summary file before and then you mark it as deleted or junk, the summary
file is out of date. If this is done when tracker isn't running or on
another machine, etc. you would _HAVE_ to read the summary file again on
start up to make sure you were synchronised. At least that's how I
understand it.

 As far as I know, it's quite unavoidable to parse again summaries, since
 under some circumstances Message IDs could be reused, which would leave
 you with inconsistent data in the DBs. Even if it isn't, expunging a
 folder would render any stored offset for the summary file useless (even
 dangerous).
 
 true but we would get a deletion from inotify of the summary file if
 that was the case. Its not a byte offset but message count - so we skip
 x messages to get the new ones (similar to what beagle does)

As I illustrated above, you can't guarantee Tracker is either:

1) running all the time
2) email isn't deleted/etc from another machine/client/webserver/etc.

 Besides, when testing summary parsing, I remember it was pretty fast
 (like 2-3 seconds for a ~6500 emails summary), of course without
 inserting to DBs nor doing message body or attachments sniffing, which
 is more or less what should happen if the junk/deleted flag is set.
 
 with 100,000+ emails its quite noticeable

The difference is not really an issue. Most people don't have that many
emails. For those that do, they can expect to wait a bit longer. Really,
the difference you are arguing about here is insignificant. If you have
to wait another 30 seconds because you have a ridiculous number of
emails, I don't think that is a problem especially if you are
guaranteeing synchronicity.

 the use of a separate junk email table meant
 lookups were confined to that table and not the services table so was
 faster when number of emails was high
 You mean the JunkMails table in email-meta.db? As far as I see, this
 table is just looked up to make sure there aren't duplicates when
 inserting. And in the end, you still have to lookup/modify the Services
 table, even if the junk mail wasn't there.

 
 no when junk/deleted email is encountered during the start up scan its
 UID is checked against that table  (JunkMails) to see if we already know
 about it. If its not in that table then we add it and then delete it
 from our index. Ergo its more efficient than what you have

So if you remove this step completely and just check the index on start
up shouldn't it be JUST as efficient? Checking a table for junk and
keeping that synchronised should be just as wasteful as scanning the
summary file I would imagine.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-13 Thread Jamie McCracken
On Wed, 2008-08-13 at 17:00 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  On Wed, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote:
  Hi!,
 
 Hi :)
 
  On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote:
 
  snip
 
  that sounds inefficient - trunk only ever checked for existing deleted
  or junk emails at startup because iterating through all emails in the
  summary files is expensive. 
  From what I've read in trunk code, you still iterate through all the
  mails in the summary in check_summary_file(), and you will have to
  iterate over them again later to index new messages, etc...
  
  yes but when we are not doing the startup check, we are skipping so its
  faster and we are not stopping at any deleted or junk email and checking
  it 
 
 Of course it is faster, but that doesn't mean we are completely
 synchronised - unless I missed the point. If you had an email in the
 summary file before and then you mark it as deleted or junk, the summary
 file is out of date. If this is done when tracker isn't running or on
 another machine, etc. you would _HAVE_ to read the summary file again on
 start up to make sure you were synchronised. At least that's how I
 understand it.
 
  As far as I know, it's quite unavoidable to parse again summaries, since
  under some circumstances Message IDs could be reused, which would leave
  you with inconsistent data in the DBs. Even if it isn't, expunging a
  folder would render any stored offset for the summary file useless (even
  dangerous).
  
  true but we would get a deletion from inotify of the summary file if
  that was the case. Its not a byte offset but message count - so we skip
  x messages to get the new ones (similar to what beagle does)
 
 As I illustrated above, you can't guarantee Tracker is either:
 
 1) running all the time
 2) email isn't deleted/etc from another machine/client/webserver/etc.

such synchronicty can be done at startup - we will know if something is
modified

theres no need to attempt it everytime an email arrives which is my
point

it was optimised before but the changes in your branch have removed this
- pls revert

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-13 Thread Carlos Garnacho
Hi :),

On mié, 2008-08-13 at 11:47 -0400, Jamie McCracken wrote:
 On Wed, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote:
  Hi!,
  
  On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote:
  
  snip
  
   that sounds inefficient - trunk only ever checked for existing deleted
   or junk emails at startup because iterating through all emails in the
   summary files is expensive. 
  
  From what I've read in trunk code, you still iterate through all the
  mails in the summary in check_summary_file(), and you will have to
  iterate over them again later to index new messages, etc...
 
 yes but when we are not doing the startup check, we are skipping so its
 faster and we are not stopping at any deleted or junk email and checking
 it 

How much time to you plan to save doing fseek() instead of fread()? I've
updated the code in indexer-split to just read over the message when it
gets to a deleted/junk message, and read_summary() could be changed to
do fseek() if no data is asked. That makes the indexer-split code do one
pass where trunk does two. Less disk head movement I'd say :)

Also, take into account that you're forced to fread() even if you're
skipping a message, since you have to know strings length to be able to
skip them.

 
 
  
  As far as I know, it's quite unavoidable to parse again summaries, since
  under some circumstances Message IDs could be reused, which would leave
  you with inconsistent data in the DBs. Even if it isn't, expunging a
  folder would render any stored offset for the summary file useless (even
  dangerous).
 
 true but we would get a deletion from inotify of the summary file if
 that was the case. Its not a byte offset but message count - so we skip
 x messages to get the new ones (similar to what beagle does)

With Expunge I meant tell $MAIL_APP to get rid of deleted messages in
the mail folder, in Evolution that would change the summary file and
mess up offsets for sure.

As far as I see, for mbox you're storing the offset in the stream:

msg_offset = g_mime_parser_tell (mf-parser);

mail_msg-offset = msg_offset;

For IMAP, I just get 0 in the Services table, also didn't get to see
any code to do this.

 
 
  
  Besides, when testing summary parsing, I remember it was pretty fast
  (like 2-3 seconds for a ~6500 emails summary), of course without
  inserting to DBs nor doing message body or attachments sniffing, which
  is more or less what should happen if the junk/deleted flag is set.
 
 with 100,000+ emails its quite noticeable
 
 
  
   the use of a separate junk email table meant
   lookups were confined to that table and not the services table so was
   faster when number of emails was high
  
  You mean the JunkMails table in email-meta.db? As far as I see, this
  table is just looked up to make sure there aren't duplicates when
  inserting. And in the end, you still have to lookup/modify the Services
  table, even if the junk mail wasn't there.
  
 
 no when junk/deleted email is encountered during the start up scan its
 UID is checked against that table  (JunkMails) to see if we already know
 about it. If its not in that table then we add it and then delete it
 from our index. Ergo its more efficient than what you have

Could you tell me where's that code? The only users for
InsertJunk/LookupJunk (the stored procedures) are
tracker_db_email_insert_junk() and tracker_db_email_lookup_junk(), the
former is also the only user of the latter, and it doesn't do what you
mention.

The only place I see where it could delete emails from the DB for
Evolution is check_summary_file(), and tracker_db_email_delete_email()
seems to be called inconditionally for any junk/deleted message found.


 
 
   
   we should also avoid doing this whenever the summary file changes which
   is why we stored an offset in trunk so we skip over messages to get to
   the new ones only when summary files change or do nothing if no new ones
   are present
  
  As said above, I think there are pretty good reasons to avoid this.
  
   
   the trunk way is faster so i would prefer that restored
  
  If you bear with me, I'd prefer to try a few optimizations before having
  to add special cases.
 
 well not doing the junk/deletion check everytime the summary file changes 
 must obviously be faster?

Sure, but it's also more beneficial for users if tracker DB contents are
up to date with the actual data. Also, IMHO adding special cases like
this would break a design that makes tracker really extensible and easy
to develop for.

Regards,
   Carlos

-- 
Carlos Garnacho
Imendio AB - Expert solutions in GTK+
http://www.imendio.com

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-13 Thread Jamie McCracken
On Wed, 2008-08-13 at 19:30 +0200, Carlos Garnacho wrote:
 Hi :),
 
 On mié, 2008-08-13 at 11:47 -0400, Jamie McCracken wrote:
  On Wed, 2008-08-13 at 17:12 +0200, Carlos Garnacho wrote:
   Hi!,
   
   On mar, 2008-08-12 at 14:18 -0400, Jamie McCracken wrote:
   
   snip
   
that sounds inefficient - trunk only ever checked for existing deleted
or junk emails at startup because iterating through all emails in the
summary files is expensive. 
   
   From what I've read in trunk code, you still iterate through all the
   mails in the summary in check_summary_file(), and you will have to
   iterate over them again later to index new messages, etc...
  
  yes but when we are not doing the startup check, we are skipping so its
  faster and we are not stopping at any deleted or junk email and checking
  it 
 
 How much time to you plan to save doing fseek() instead of fread()? I've
 updated the code in indexer-split to just read over the message when it
 gets to a deleted/junk message, and read_summary() could be changed to
 do fseek() if no data is asked. That makes the indexer-split code do one
 pass where trunk does two. Less disk head movement I'd say :)
 
 Also, take into account that you're forced to fread() even if you're
 skipping a message, since you have to know strings length to be able to
 skip them.
 

I know that - what I want to avoid is doing lookups on the email
services table whenever it returns null whenever there is a new email


  
  
   
   As far as I know, it's quite unavoidable to parse again summaries, since
   under some circumstances Message IDs could be reused, which would leave
   you with inconsistent data in the DBs. Even if it isn't, expunging a
   folder would render any stored offset for the summary file useless (even
   dangerous).
  
  true but we would get a deletion from inotify of the summary file if
  that was the case. Its not a byte offset but message count - so we skip
  x messages to get the new ones (similar to what beagle does)
 
 With Expunge I meant tell $MAIL_APP to get rid of deleted messages in
 the mail folder, in Evolution that would change the summary file and
 mess up offsets for sure.
 
 As far as I see, for mbox you're storing the offset in the stream:
 
 msg_offset = g_mime_parser_tell (mf-parser);
 
 mail_msg-offset = msg_offset;
 
 For IMAP, I just get 0 in the Services table, also didn't get to see
 any code to do this.


imap stores message count too - its count rather than byte offset

 
  
  
   
   Besides, when testing summary parsing, I remember it was pretty fast
   (like 2-3 seconds for a ~6500 emails summary), of course without
   inserting to DBs nor doing message body or attachments sniffing, which
   is more or less what should happen if the junk/deleted flag is set.
  
  with 100,000+ emails its quite noticeable
  
  
   
the use of a separate junk email table meant
lookups were confined to that table and not the services table so was
faster when number of emails was high
   
   You mean the JunkMails table in email-meta.db? As far as I see, this
   table is just looked up to make sure there aren't duplicates when
   inserting. And in the end, you still have to lookup/modify the Services
   table, even if the junk mail wasn't there.
   
  
  no when junk/deleted email is encountered during the start up scan its
  UID is checked against that table  (JunkMails) to see if we already know
  about it. If its not in that table then we add it and then delete it
  from our index. Ergo its more efficient than what you have
 
 Could you tell me where's that code? The only users for
 InsertJunk/LookupJunk (the stored procedures) are
 tracker_db_email_insert_junk() and tracker_db_email_lookup_junk(), the
 former is also the only user of the latter, and it doesn't do what you
 mention.
 
 The only place I see where it could delete emails from the DB for
 Evolution is check_summary_file(), and tracker_db_email_delete_email()
 seems to be called inconditionally for any junk/deleted message found.

the way it should work is as described above

I had tested it and it works (deleted and junk emails are pruned on next
restart of trackerd)

How do you currently tell which emails are new in the summary file?
Without storing the count you cannot know without verifying each email
exists in the services table (which would obviously be unacceptable
performance wise)


 
 
  
  

we should also avoid doing this whenever the summary file changes which
is why we stored an offset in trunk so we skip over messages to get to
the new ones only when summary files change or do nothing if no new ones
are present
   
   As said above, I think there are pretty good reasons to avoid this.
   

the trunk way is faster so i would prefer that restored
   
   If you bear with me, I'd prefer to try a few optimizations before having
   to add special cases.
  
  well not doing the junk/deletion 

Re: [Tracker] more issues with indexer-split

2008-08-12 Thread Carlos Garnacho
Hi!,

On lun, 2008-08-11 at 22:49 -0400, Jamie McCracken wrote:
 As I will be working quite extensively on trunk post merge I require any
 major changes to be done ASAP
 
 list of changes required prior to merge (in order of priority) - all of
 these already exist and work in trunk:
 
 
 1) trackerd: Handle file moves - update files in a directory
 recursively when a directory is renamed/moved (need to pause indexer
 before updating - watch out!). Likewise re-enable update of index from
 trackerd as its needed for tagging and other user metadata
 
 2) I could not see deletion of deleted and junk emails at startup (I see
 it for new emails only but not for older emails that may have been
 marked) - pls restore this functionality from trunk. Code needs to check
 all summary files from start to end for junk and if not already marked
 as junk in the JunkMail table then you must delete them and mark them.
 If in doubt see trunk - older emails marked as deleted/junk will remain
 in index which is unacceptable. see function check_summary_file in
 trunks tracker-email-evolution.c

This item is done in a different way:

When trackerd notices the summary file has changed (or does the initial
files processing), notifies the indexer. 

The indexer, which is the one aware of the actual file contents,
iterates through the messages, and the mail module will return NULL
metadata for any junk/deleted email, making tracker-indexer delete
anything in the DBs/index related to these mails.

Regards,
   Carlos

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-12 Thread Martyn Russell
Jamie McCracken wrote:
 As I will be working quite extensively on trunk post merge I require any
 major changes to be done ASAP

OK.

 list of changes required prior to merge (in order of priority) - all of
 these already exist and work in trunk:
 
 
 1) trackerd: Handle file moves - update files in a directory
 recursively when a directory is renamed/moved (need to pause indexer
 before updating - watch out!). Likewise re-enable update of index from
 trackerd as its needed for tagging and other user metadata

This was heavily worked on today, progress is ongoing.

 2) I could not see deletion of deleted and junk emails at startup (I see
 it for new emails only but not for older emails that may have been
 marked) - pls restore this functionality from trunk. Code needs to check
 all summary files from start to end for junk and if not already marked
 as junk in the JunkMail table then you must delete them and mark them.
 If in doubt see trunk - older emails marked as deleted/junk will remain
 in index which is unacceptable. see function check_summary_file in
 trunks tracker-email-evolution.c

Carlos has some comments to add here.

 3) Clean and timely shutdown of trackerd and indexer if running (maybe
 have a tracker-shutdown command as we have dbus interface for this which
 is needed by the prefs/applet). Anyway crawler must end immediately and
 must regularly force glib mainloop iterations so that any dbus requests
 are handled in a timely fashion (they appear to be heavily slowed down
 by the crawler)

Fixed.

 4) Pause on detected external IO (IE from other apps). If we detect
 writes in the watched directories the original trackerd paused for a few
 seconds. this greatly sped up things like compiling sources which were
 dog slow due to indexing. This appears to have been removed in the
 branch - see pause_io in tracker-status.c

Fixed.

 5) pause on battery (both for first time index and subsequent ones) -
 needs to be done I think? (appears commented out in tracker-status.c)

Fixed.

-- 
Regards,
Martyn
___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list


Re: [Tracker] more issues with indexer-split

2008-08-12 Thread Jamie McCracken
On Tue, 2008-08-12 at 18:01 +0100, Martyn Russell wrote:
 Jamie McCracken wrote:
  As I will be working quite extensively on trunk post merge I require any
  major changes to be done ASAP
 
 OK.
 
  list of changes required prior to merge (in order of priority) - all of
  these already exist and work in trunk:
  
  
  1) trackerd: Handle file moves - update files in a directory
  recursively when a directory is renamed/moved (need to pause indexer
  before updating - watch out!). Likewise re-enable update of index from
  trackerd as its needed for tagging and other user metadata
 
 This was heavily worked on today, progress is ongoing.
 
  2) I could not see deletion of deleted and junk emails at startup (I see
  it for new emails only but not for older emails that may have been
  marked) - pls restore this functionality from trunk. Code needs to check
  all summary files from start to end for junk and if not already marked
  as junk in the JunkMail table then you must delete them and mark them.
  If in doubt see trunk - older emails marked as deleted/junk will remain
  in index which is unacceptable. see function check_summary_file in
  trunks tracker-email-evolution.c
 
 Carlos has some comments to add here.
 
  3) Clean and timely shutdown of trackerd and indexer if running (maybe
  have a tracker-shutdown command as we have dbus interface for this which
  is needed by the prefs/applet). Anyway crawler must end immediately and
  must regularly force glib mainloop iterations so that any dbus requests
  are handled in a timely fashion (they appear to be heavily slowed down
  by the crawler)
 
 Fixed.
 
  4) Pause on detected external IO (IE from other apps). If we detect
  writes in the watched directories the original trackerd paused for a few
  seconds. this greatly sped up things like compiling sources which were
  dog slow due to indexing. This appears to have been removed in the
  branch - see pause_io in tracker-status.c
 
 Fixed.
 
  5) pause on battery (both for first time index and subsequent ones) -
  needs to be done I think? (appears commented out in tracker-status.c)
 
 Fixed.
 

thanks - sounds like it will be ready in a few days

jamie

___
tracker-list mailing list
tracker-list@gnome.org
http://mail.gnome.org/mailman/listinfo/tracker-list