Re: Removing email from Xapian tier databases

2019-02-12 Thread Sebastian Hagedorn

No, I hadn't, but I have now – just in case.

Thanks, Sebastian

--On 11. Februar 2019 um 11:40:46 -0500 Bron Gondwana 
 wrote:



Excellent - I hope you grabbed both that commit and the one afterwards
where I fixed the order of CID parsing.

Actually, it might not be as big a deal on 3.0, but not calculating the
CID first did break one JMAP case on future.

On Tue, Feb 12, 2019, at 02:22, Sebastian Hagedorn wrote:

Thanks! I rolled my own RPM with that patch, and I can confirm that it
works.

--On 11. Februar 2019 um 09:12:14 -0500 Bron Gondwana
 wrote:

> Yep, it's fixed in git now, so the next release will automatically
> create G keys for messages, even if they don't have a threadid!

--
   .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
  .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Removing email from Xapian tier databases

2019-02-11 Thread Bron Gondwana
Excellent - I hope you grabbed both that commit and the one afterwards where I 
fixed the order of CID parsing.

Actually, it might not be as big a deal on 3.0, but not calculating the CID 
first did break one JMAP case on future.

Cheers,

Bron.

On Tue, Feb 12, 2019, at 02:22, Sebastian Hagedorn wrote:
> Thanks! I rolled my own RPM with that patch, and I can confirm that it 
> works.
> 
> --On 11. Februar 2019 um 09:12:14 -0500 Bron Gondwana 
>  wrote:
> 
> > Yep, it's fixed in git now, so the next release will automatically create
> > G keys for messages, even if they don't have a threadid!
> >
> > Bron.
> >
> > On Mon, Feb 11, 2019, at 21:30, Sebastian Hagedorn wrote:
> >> So running ctl_conversationsdb -z followed by -b would assign thread ids
> >> to those messages? Because it works when I do that. Clearly this is an
> >> edge case, but IMO it should be handled somehow other than silently
> >> failing ;-)
> >>
> >> --On 11. Februar 2019 um 05:16:47 -0500 Bron Gondwana
> >>  wrote:
> >>
> >> > That sounds like the source messages have no thread id, and hence they
> >> > aren't being stored.
> >> >
> >> > This is an interesting question actually, should we still store G keys
> >> > for messages without thread identifier (CID)?
> >> >
> >> > Bron.
> >> >
> >> > On Mon, Feb 11, 2019, at 21:11, Sebastian Hagedorn wrote:
> >> >> Hi Bron,
> >> >>
> >> >> --On 11. Februar 2019 um 04:23:16 -0500 Bron Gondwana
> >> >>  wrote:
> >> >>
> >> >> > The data in conversations.db is added and removed in real time as
> >> >> > messages are appended and updated in the cyrus.index.
> >> >>
> >> >> do you know why that does not seem to happen when using the "old" sync
> >> >> protocol for replication?
> >> >>
> >> >> 
> >> --
> -- 
>  .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
>  .:.Regionales Rechenzentrum (RRZK).:.
>  .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.
> 
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

--
 Bron Gondwana, CEO, FastMail Pty Ltd
 br...@fastmailteam.com


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Removing email from Xapian tier databases

2019-02-11 Thread Sebastian Hagedorn
Thanks! I rolled my own RPM with that patch, and I can confirm that it 
works.


--On 11. Februar 2019 um 09:12:14 -0500 Bron Gondwana 
 wrote:



Yep, it's fixed in git now, so the next release will automatically create
G keys for messages, even if they don't have a threadid!

Bron.

On Mon, Feb 11, 2019, at 21:30, Sebastian Hagedorn wrote:

So running ctl_conversationsdb -z followed by -b would assign thread ids
to  those messages? Because it works when I do that. Clearly this is an
edge  case, but IMO it should be handled somehow other than silently
failing ;-)

--On 11. Februar 2019 um 05:16:47 -0500 Bron Gondwana
 wrote:

> That sounds like the source messages have no thread id, and hence they
> aren't being stored.
>
> This is an interesting question actually, should we still store G keys
> for messages without thread identifier (CID)?
>
> Bron.
>
> On Mon, Feb 11, 2019, at 21:11, Sebastian Hagedorn wrote:
>> Hi Bron,
>>
>> --On 11. Februar 2019 um 04:23:16 -0500 Bron Gondwana
>>  wrote:
>>
>> > The data in conversations.db is added and removed in real time as
>> > messages are appended and updated in the cyrus.index.
>>
>> do you know why that does not seem to happen when using the "old" sync
>> protocol for replication?
>>
>> 
--

--
   .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
  .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Removing email from Xapian tier databases

2019-02-11 Thread Egoitz Aurrekoetxea
Many many many many thanks a lot Brong!! :) :) :) :) :) :) :)

---

EGOITZ AURREKOETXEA 
Departamento de sistemas 
944 209 470
Parque Tecnológico. Edificio 103
48170 Zamudio (Bizkaia) 
ego...@sarenet.es 
www.sarenet.es [1] 
Antes de imprimir este correo electrónico piense si es necesario
hacerlo. 

El 11-02-2019 15:19, Bron Gondwana escribió:

> It's definitely safe to have one rolling mode writing and one repacking.  I 
> wouldn't run multiple repacks in parallel, as they can wind up doing 
> duplicate work (though the end result should always be correct and safe). 
> 
> Here's what we run: 
> 
> # Any time the disk gets over 50%, compress -o single down to data 
> 13 *  * * * [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_compact.pl 
> -a -o -d 50 temp data' %] 
> # Copy the temporary search databases down to data during the week 
> 43 1  * * 1,2,3,4,5,6 [% INCLUDE cronjob 
> c='/home/mod_perl/hm/scripts/xapian_compact.pl -a temp,meta data' %] 
> # Sundays repack the entire data directory 
> 43 1  * * 0 [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_compact.pl 
> -a temp,meta,data data' %] 
> # Late on Sundays, pack any oversized data directories down to archive 
> 0 15 * * 0 [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_archive.pl 
> -a' %] 
> 
> And here's the interesting logic.  In xapian_compact.pl: 
> 
> if ($Opts{d}) {
> 
> my $Path = $Slot->SearchPath();
> 
> my $Usage = df($Path);
> 
> my $RunUsage = df("/run/cyrus");
> 
> return Process::Status->new(0) if ($Usage->{per} < $Opts{d} and 
> $RunUsage->{per} < $Opts{d});
> 
> }
> 
> my @args = (-z => $dest, -t => $src);
> 
> push @args, '-v' if $Opts{v};
> 
> push @args, '-o' if $Opts{o};
> 
> push @args, '-F' if $Opts{F};
> 
> push @args, '-X' if $Opts{X};
> 
> push @args, ('-T' => $Opts{T}) if $Opts{T};
> 
> push @args, ('-u' => $Opts{u}) if $Opts{u};
> 
> my %RunOpts = (
> 
> PrintOutput => 1,
> 
> );
> 
> $RunOpts{Nice} = 1 unless $Opts{N};
> 
> $RunOpts{Daemon} = 1 if $Opts{D};
> 
> $0 = "xapian_compact: $SN";
> 
> $Slot->RunCommand(\%RunOpts, 'squatter', @args);
> 
> And in xapian_archive.pl: 
> 
> my $Percent = $Opts{P} || 20; 
> [...]
> 
> foreach my $user (sort keys %$DataUsage) {
> 
> my $au = $ArchiveUsage->{$user} || 1;
> 
> my $du = $DataUsage->{$user} || 1;
> 
> if ($du < 5000) {
> 
> print "Too small $user ($du)\n";
> 
> next;
> 
> }
> 
> my $This = int($du * 100 / $au);
> 
> if ($This < $Percent) {
> 
> print "Not enough dirty $user: ($du, $au)\n";
> 
> next;
> 
> }
> 
> print "Recompacting $user: ($du, $au)\n";
> 
> my @args = (-z => 'archive', -t => 'data,archive'); 
> [...]
> 
> In summary, repack data down to archive if data is more than 1/5 size of 
> existing archive.  So each of these scripts is a wrapper around squatter to 
> help it run automatically. 
> 
> Bron. 
> 
> On Mon, Feb 11, 2019, at 21:55, Egoitz Aurrekoetxea wrote: 
> 
> Now I'm noticing for instance, for moving data between Xapian databases.. you 
> need to launch something like :
> 
> sudo -u cyrus /usr/cyrus/bin/squatter -C /usr/local/etc/imapd.conf -v -z 
> archive -t temp,meta,data,archive -u user/ego...@sarenet.es 
> 
> perhaps would be better to do : 
> sudo -u cyrus /usr/cyrus/bin/squatter -C /usr/local/etc/imapd.conf -F -v -z 
> archive -t temp,meta,data,archive -u user/ego...@sarenet.es 
> But then, having two Squatter processes running at same time, one for rolling 
> mode and one for moving/repacking data, should not be an issue?. 
> 
> Thanks mates!! 
> 
> --- 
> 
> EGOITZ AURREKOETXEA 
> Departamento de sistemas 
> 
> 944 209 470 
> Parque Tecnológico. Edificio 103 
> 48170 Zamudio (Bizkaia) 
> ego...@sarenet.es
> 
> www.sarenet.es [1] 
> 
> Antes de imprimir este correo electrónico piense si es necesario hacerlo. 
> 
> El 11-02-2019 11:22, Egoitz Aurrekoetxea escribió: 
> 
> Hi Bron, 
> 
> So, it would be interesting to run once a day... for instance in cyrus.conf 
> in events section : 
> 
> repack_xapian  cmd="squatter -F" at=0200 
> 
> Is it needed top stop the other rolling Squatter we run, in same cyrus.conf 
> as : 
> 
> START { 
> # do not delete this entry! 
> recover   cmd="ctl_cyrusdb -r" 
> 
> squatter cmd="squatter -R" 
> } 
> 
> Thank you so much for all the clarifications mate :) really :) 
> 
> Cheers!
> 
> --- 
> 
> EGOITZ AURREKOETXEA 
> Departamento de sistemas 
> 
> 944 209 470 
> Parque Tecnológico. Edificio 103 
> 48170 Zamudio (Bizkaia) 
> ego...@sarenet.es
> 
> www.sarenet.es [1] 
> 
> Antes de imprimir este correo electrónico piense si es necesario hacerlo. 
> 
> El 11-02-2019 10:23, Bron Gondwana escribió: 
> Conversations.db is an index over lots of interesting bits of the message, 
> but the key part that's used by Xapian is the mapping from G key (aka: GUID, 
> aka: sha1 of the message RFC822 data) to individual email.  It's used for 
> deduplication and for mapping from results to messages. 
> 
> The data in conversations.db is added and removed in real time as messages 
> are 

Re: Removing email from Xapian tier databases

2019-02-11 Thread Egoitz Aurrekoetxea
Hi mates! 

Just for finishing this thread... Two squatter proccesses then... one in
rolling mode and another one for info movement between Xapian databases
and repacking databases as Brong said... can be running without known
issues?. I say for avoid damaging something... thanks a lot mates! 

Cheers!!

---

EGOITZ AURREKOETXEA 
Departamento de sistemas 
944 209 470
Parque Tecnológico. Edificio 103
48170 Zamudio (Bizkaia) 
ego...@sarenet.es 
www.sarenet.es [1] 
Antes de imprimir este correo electrónico piense si es necesario
hacerlo. 

El 11-02-2019 15:12, Bron Gondwana escribió:

> Yep, it's fixed in git now, so the next release will automatically create G 
> keys for messages, even if they don't have a threadid! 
> 
> Bron. 
> 
> On Mon, Feb 11, 2019, at 21:30, Sebastian Hagedorn wrote: 
> 
>> So running ctl_conversationsdb -z followed by -b would assign thread ids to  
>> those messages? Because it works when I do that. Clearly this is an edge  
>> case, but IMO it should be handled somehow other than silently failing ;-) 
>> 
>> --On 11. Februar 2019 um 05:16:47 -0500 Bron Gondwana  
>>  wrote: 
>> 
>>> That sounds like the source messages have no thread id, and hence they 
>>> aren't being stored. 
>>> 
>>> This is an interesting question actually, should we still store G keys 
>>> for messages without thread identifier (CID)? 
>>> 
>>> Bron. 
>>> 
>>> On Mon, Feb 11, 2019, at 21:11, Sebastian Hagedorn wrote: 
 Hi Bron, 
 
 --On 11. Februar 2019 um 04:23:16 -0500 Bron Gondwana 
  wrote: 
 
> The data in conversations.db is added and removed in real time as 
> messages are appended and updated in the cyrus.index. 
 
 do you know why that does not seem to happen when using the "old" sync 
 protocol for replication? 
 
  
>> --  
>> .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:. 
>> .:.Regionales Rechenzentrum (RRZK).:. 
>> .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:. 
>>  
>> Cyrus Home Page: http://www.cyrusimap.org/ 
>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ 
>> To Unsubscribe: 
>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
> 
> -- 
> Bron Gondwana, CEO, FastMail Pty Ltd 
> br...@fastmailteam.com 
> 
> 
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
 

Links:
--
[1] http://www.sarenet.es
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Removing email from Xapian tier databases

2019-02-11 Thread Bron Gondwana
It's definitely safe to have one rolling mode writing and one repacking. I 
wouldn't run multiple repacks in parallel, as they can wind up doing duplicate 
work (though the end result should always be correct and safe).

Here's what we run:

# Any time the disk gets over 50%, compress -o single down to data
13 * * * * [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_compact.pl -a 
-o -d 50 temp data' %]
# Copy the temporary search databases down to data during the week
43 1 * * 1,2,3,4,5,6 [% INCLUDE cronjob 
c='/home/mod_perl/hm/scripts/xapian_compact.pl -a temp,meta data' %]
# Sundays repack the entire data directory
43 1 * * 0 [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_compact.pl -a 
temp,meta,data data' %]
# Late on Sundays, pack any oversized data directories down to archive
0 15 * * 0 [% INCLUDE cronjob c='/home/mod_perl/hm/scripts/xapian_archive.pl 
-a' %]

And here's the interesting logic. In xapian_compact.pl:

 if ($Opts{d}) {
 my $Path = $Slot->SearchPath();
 my $Usage = df($Path);
 my $RunUsage = df("/run/cyrus");
 return Process::Status->new(0) if ($Usage->{per} < $Opts{d} and 
$RunUsage->{per} < $Opts{d});
 }

 my @args = (-z => $dest, -t => $src);
 push @args, '-v' if $Opts{v};
 push @args, '-o' if $Opts{o};
 push @args, '-F' if $Opts{F};
 push @args, '-X' if $Opts{X};
 push @args, ('-T' => $Opts{T}) if $Opts{T};
 push @args, ('-u' => $Opts{u}) if $Opts{u};
 my %RunOpts = (
 PrintOutput => 1,
 );
 $RunOpts{Nice} = 1 unless $Opts{N};
 $RunOpts{Daemon} = 1 if $Opts{D};

 $0 = "xapian_compact: $SN";
 $Slot->RunCommand(\%RunOpts, 'squatter', @args);

And in xapian_archive.pl:

my $Percent = $Opts{P} || 20;
[...]

 foreach my $user (sort keys %$DataUsage) {
 my $au = $ArchiveUsage->{$user} || 1;
 my $du = $DataUsage->{$user} || 1;
 if ($du < 5000) {
 print "Too small $user ($du)\n";
 next;
 }
 my $This = int($du * 100 / $au);
 if ($This < $Percent) {
 print "Not enough dirty $user: ($du, $au)\n";
 next;
 }
 print "Recompacting $user: ($du, $au)\n";
 my @args = (-z => 'archive', -t => 'data,archive');
[...]
 
In summary, repack data down to archive if data is more than 1/5 size of 
existing archive. So each of these scripts is a wrapper around squatter to help 
it run automatically.

Bron.


On Mon, Feb 11, 2019, at 21:55, Egoitz Aurrekoetxea wrote:
> Now I'm noticing for instance, for moving data between Xapian databases.. you 
> need to launch something like :


> 
> sudo -u cyrus /usr/cyrus/bin/squatter -C /usr/local/etc/imapd.conf -v -z 
> archive -t temp,meta,data,archive -u user/ego...@sarenet.es
> 
> 
> perhaps would be better to do :
> sudo -u cyrus /usr/cyrus/bin/squatter -C /usr/local/etc/imapd.conf _*-F*_ -v 
> -z archive -t temp,meta,data,archive -u user/ego...@sarenet.es
> But then, having two Squatter processes running at same time, one for rolling 
> mode and one for moving/repacking data, should not be an issue?.
> 
> 
> Thanks mates!!
> 
> ---
>  
> sarenet
> *Egoitz Aurrekoetxea*
> Departamento de sistemas
> 944 209 470
> Parque Tecnológico. Edificio 103
> 48170 Zamudio (Bizkaia)
> ego...@sarenet.es
> www.sarenet.es
> 
> Antes de imprimir este correo electrónico piense si es necesario hacerlo.
> 


> El 11-02-2019 11:22, Egoitz Aurrekoetxea escribió:


>> Hi Bron,


>> 


>> So, it would be interesting to run once a day... for instance in cyrus.conf 
>> in events section :


>> repack_xapian cmd="squatter -F" at=0200


>> Is it needed top stop the other rolling Squatter we run, in same cyrus.conf 
>> as :




>> START {
>>  # do not delete this entry!
>>  recover cmd="ctl_cyrusdb -r"
>> 
>>  squatter cmd="squatter -R"
>> }


>> 


>> Thank you so much for all the clarifications mate :) really :)


>> 


>> Cheers!


>> ---
>>  
>> sarenet
>> *Egoitz Aurrekoetxea*
>> Departamento de sistemas
>> 944 209 470
>> Parque Tecnológico. Edificio 103
>> 48170 Zamudio (Bizkaia)
>> ego...@sarenet.es
>> www.sarenet.es
>> 
>> Antes de imprimir este correo electrónico piense si es necesario hacerlo.
>> 


>> El 11-02-2019 10:23, Bron Gondwana escribió:


>>> Conversations.db is an index over lots of interesting bits of the message, 
>>> but the key part that's used by Xapian is the mapping from G key (aka: 
>>> GUID, aka: sha1 of the message RFC822 data) to individual email. It's used 
>>> for deduplication and for mapping from results to messages.
>>>  
>>> The data in conversations.db is added and removed in real time as messages 
>>> are appended and updated in the cyrus.index.
>>>  
>>> The data in the xapian databases on the other hand is append only - so you 
>>> can wind up with hits that no longer map to existing emails. The way to 
>>> solve that is with a xapian repack that filters messages - which can be 
>>> done using the -F flag to squatter.
>>>  
>>> Cheers,
>>>  
>>> Bron.
>>>  
>>> On Sat, Feb 9, 2019, at 23:04, Egoitz Aurrekoetxea wrote:
 Good morning,


 


 As far as I understood, for Xapian you first create it's conversation 
 database in 

Re: Removing email from Xapian tier databases

2019-02-11 Thread Bron Gondwana
Yep, it's fixed in git now, so the next release will automatically create G 
keys for messages, even if they don't have a threadid!

Bron.

On Mon, Feb 11, 2019, at 21:30, Sebastian Hagedorn wrote:
> So running ctl_conversationsdb -z followed by -b would assign thread ids to 
> those messages? Because it works when I do that. Clearly this is an edge 
> case, but IMO it should be handled somehow other than silently failing ;-)
> 
> --On 11. Februar 2019 um 05:16:47 -0500 Bron Gondwana 
>  wrote:
> 
> > That sounds like the source messages have no thread id, and hence they
> > aren't being stored.
> >
> > This is an interesting question actually, should we still store G keys
> > for messages without thread identifier (CID)?
> >
> > Bron.
> >
> > On Mon, Feb 11, 2019, at 21:11, Sebastian Hagedorn wrote:
> >> Hi Bron,
> >>
> >> --On 11. Februar 2019 um 04:23:16 -0500 Bron Gondwana
> >>  wrote:
> >>
> >> > The data in conversations.db is added and removed in real time as
> >> > messages are appended and updated in the cyrus.index.
> >>
> >> do you know why that does not seem to happen when using the "old" sync
> >> protocol for replication?
> >>
> >> 
> -- 
>  .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
>  .:.Regionales Rechenzentrum (RRZK).:.
>  .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.
> 
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

--
 Bron Gondwana, CEO, FastMail Pty Ltd
 br...@fastmailteam.com


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Removing email from Xapian tier databases

2019-02-11 Thread Egoitz Aurrekoetxea
Now I'm noticing for instance, for moving data between Xapian
databases.. you need to launch something like : 

sudo -u cyrus /usr/cyrus/bin/squatter -C /usr/local/etc/imapd.conf -v -z
archive -t temp,meta,data,archive -u user/ego...@sarenet.es

perhaps would be better to do :

sudo -u cyrus /usr/cyrus/bin/squatter -C /usr/local/etc/imapd.conf -F -v
-z archive -t temp,meta,data,archive -u user/ego...@sarenet.es

But then, having two Squatter processes running at same time, one for
rolling mode and one for moving/repacking data, should not be an issue?.

Thanks mates!!

---

EGOITZ AURREKOETXEA 
Departamento de sistemas 
944 209 470
Parque Tecnológico. Edificio 103
48170 Zamudio (Bizkaia) 
ego...@sarenet.es 
www.sarenet.es [1] 
Antes de imprimir este correo electrónico piense si es necesario
hacerlo. 

El 11-02-2019 11:22, Egoitz Aurrekoetxea escribió:

> Hi Bron, 
> 
> So, it would be interesting to run once a day... for instance in cyrus.conf 
> in events section : 
> 
> repack_xapian  cmd="squatter -F" at=0200 
> 
> Is it needed top stop the other rolling Squatter we run, in same cyrus.conf 
> as : 
> 
> START {
> # do not delete this entry!
> recover   cmd="ctl_cyrusdb -r"
> 
> squatter cmd="squatter -R"
> } 
> 
> Thank you so much for all the clarifications mate :) really :) 
> 
> Cheers!
> 
> ---
> 
> EGOITZ AURREKOETXEA 
> Departamento de sistemas 
> 944 209 470
> Parque Tecnológico. Edificio 103
> 48170 Zamudio (Bizkaia) 
> ego...@sarenet.es 
> www.sarenet.es [1] 
> Antes de imprimir este correo electrónico piense si es necesario hacerlo. 
> 
> El 11-02-2019 10:23, Bron Gondwana escribió: 
> Conversations.db is an index over lots of interesting bits of the message, 
> but the key part that's used by Xapian is the mapping from G key (aka: GUID, 
> aka: sha1 of the message RFC822 data) to individual email.  It's used for 
> deduplication and for mapping from results to messages. 
> 
> The data in conversations.db is added and removed in real time as messages 
> are appended and updated in the cyrus.index. 
> 
> The data in the xapian databases on the other hand is append only - so you 
> can wind up with hits that no longer map to existing emails.  The way to 
> solve that is with a xapian repack that filters messages - which can be done 
> using the -F flag to squatter. 
> 
> Cheers, 
> 
> Bron. 
> 
> On Sat, Feb 9, 2019, at 23:04, Egoitz Aurrekoetxea wrote: 
> 
> Good morning, 
> 
> As far as I understood, for Xapian you first create it's conversation 
> database in order to work. Later you create database(s) for each mailbox 
> where Xapian can search in. You can move data between them, new mails become 
> indexed for instance Squatter in rolling mode... that's ok... and understood 
> I  think. I was wondering, what happens when mail indexed in the archive 
> database in removed and then does not exist any more in the database... does 
> Squatter rolling log manage that too?. 
> 
> By the way. I was wondering if mail gets indexed in the tier databases (for 
> instance in Fastmail in temp, meta, data, archine...) what's the role or 
> function of conversations databases you create with ctl_conversationsdb -b -r 
> ?. 
> 
> Cheers!
> 
> -- 
> 
> EGOITZ AURREKOETXEA 
> Departamento de sistemas 
> 
> 944 209 470 
> Parque Tecnológico. Edificio 103 
> 48170 Zamudio (Bizkaia) 
> ego...@sarenet.es
> 
> www.sarenet.es [1] 
> 
> Antes de imprimir este correo electrónico piense si es necesario hacerlo. 
>  
> Cyrus Home Page: http://www.cyrusimap.org/ 
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ 
> To Unsubscribe: 
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus 
> 
> -- 
> Bron Gondwana, CEO, FastMail Pty Ltd 
> br...@fastmailteam.com 
> 
> 
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
 

Links:
--
[1] http://www.sarenet.es
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Removing email from Xapian tier databases

2019-02-11 Thread Sebastian Hagedorn
So running ctl_conversationsdb -z followed by -b would assign thread ids to 
those messages? Because it works when I do that. Clearly this is an edge 
case, but IMO it should be handled somehow other than silently failing ;-)


--On 11. Februar 2019 um 05:16:47 -0500 Bron Gondwana 
 wrote:



That sounds like the source messages have no thread id, and hence they
aren't being stored.

This is an interesting question actually, should we still store G keys
for messages without thread identifier (CID)?

Bron.

On Mon, Feb 11, 2019, at 21:11, Sebastian Hagedorn wrote:

Hi Bron,

--On 11. Februar 2019 um 04:23:16 -0500 Bron Gondwana
 wrote:

> The data in conversations.db is added and removed in real time as
> messages are appended and updated in the cyrus.index.

do you know why that does not seem to happen when using the "old" sync
protocol for replication?



--
   .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
  .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Removing email from Xapian tier databases

2019-02-11 Thread Bron Gondwana
That sounds like the source messages have no thread id, and hence they aren't 
being stored.

This is an interesting question actually, should we still store G keys for 
messages without thread identifier (CID)?

Bron.

On Mon, Feb 11, 2019, at 21:11, Sebastian Hagedorn wrote:
> Hi Bron,
> 
> --On 11. Februar 2019 um 04:23:16 -0500 Bron Gondwana 
>  wrote:
> 
> > The data in conversations.db is added and removed in real time as
> > messages are appended and updated in the cyrus.index.
> 
> do you know why that does not seem to happen when using the "old" sync 
> protocol for replication?
> 
> 
> 
> Cheers,
> Sebastian
> -- 
>  .:.Sebastian Hagedorn - Weyertal 121 (Gebäude 133), Zimmer 2.02.:.
>  .:.Regionales Rechenzentrum (RRZK).:.
>  .:.Universität zu Köln / Cologne University - ✆ +49-221-470-89578.:.
> 
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

--
 Bron Gondwana, CEO, FastMail Pty Ltd
 br...@fastmailteam.com


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Removing email from Xapian tier databases

2019-02-11 Thread Bron Gondwana
Conversations.db is an index over lots of interesting bits of the message, but 
the key part that's used by Xapian is the mapping from G key (aka: GUID, aka: 
sha1 of the message RFC822 data) to individual email. It's used for 
deduplication and for mapping from results to messages.

The data in conversations.db is added and removed in real time as messages are 
appended and updated in the cyrus.index.

The data in the xapian databases on the other hand is append only - so you can 
wind up with hits that no longer map to existing emails. The way to solve that 
is with a xapian repack that filters messages - which can be done using the -F 
flag to squatter.

Cheers,

Bron.

On Sat, Feb 9, 2019, at 23:04, Egoitz Aurrekoetxea wrote:
> Good morning,


> 


> As far as I understood, for Xapian you first create it's conversation 
> database in order to work. Later you create database(s) for each mailbox 
> where Xapian can search in. You can move data between them, new mails become 
> indexed for instance Squatter in rolling mode... that's ok... and understood 
> I think. I was wondering, what happens when mail indexed in the archive 
> database in removed and then does not exist any more in the database... does 
> Squatter rolling log manage that too?.


> 


> By the way. I was wondering if mail gets indexed in the tier databases (for 
> instance in Fastmail in temp, meta, data, archine...) what's the role or 
> function of conversations databases you create with ctl_conversationsdb -b -r 
> ?.


> 


> Cheers!


> -- 
>  
> sarenet
> *Egoitz Aurrekoetxea*
> Departamento de sistemas
> 944 209 470
> Parque Tecnológico. Edificio 103
> 48170 Zamudio (Bizkaia)
> ego...@sarenet.es
> www.sarenet.es
> 
> Antes de imprimir este correo electrónico piense si es necesario hacerlo.
> 
> Cyrus Home Page: http://www.cyrusimap.org/
> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
> To Unsubscribe:
> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

--
 Bron Gondwana, CEO, FastMail Pty Ltd
 br...@fastmailteam.com


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Removing email from Xapian tier databases

2019-02-09 Thread Egoitz Aurrekoetxea
Good morning, 

As far as I understood, for Xapian you first create it's conversation
database in order to work. Later you create database(s) for each mailbox
where Xapian can search in. You can move data between them, new mails
become indexed for instance Squatter in rolling mode... that's ok... and
understood I  think. I was wondering, what happens when mail indexed in
the archive database in removed and then does not exist any more in the
database... does Squatter rolling log manage that too?. 

By the way. I was wondering if mail gets indexed in the tier databases
(for instance in Fastmail in temp, meta, data, archine...) what's the
role or function of conversations databases you create with
ctl_conversationsdb -b -r ?. 

Cheers!

-- 

EGOITZ AURREKOETXEA 
Departamento de sistemas 
944 209 470
Parque Tecnológico. Edificio 103
48170 Zamudio (Bizkaia) 
ego...@sarenet.es 
www.sarenet.es [1] 
Antes de imprimir este correo electrónico piense si es necesario
hacerlo. 

Links:
--
[1] http://www.sarenet.es
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus