Ramprasad A Padmanabhan wrote:
I know this is more of an algorithm question but please bear with me.
In my program I am checking wether a emailid exists in a list
I have in the complete_list a string like
$complete_string=[EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] ...
[EMAIL PROTECTED];
# that is a string of emailids sorted
Now I want to find out if another ID [EMAIL PROTECTED] exists in this list
How is it possible to optimize the search given that the complete list
string is sorted in order
Don't optimize--at least not until you are sure your getting the information
right. It is just the wrong emphasis.
Here are a couple of approaches that have worked for me. The first, used in
building an index, is predicated on the concept of saving all possible
information. The second, used in loading a tree structure, takes as a design
imperative that any node should focus only on one prime reference, in this case
the post being replied to. The wealth of possibilities can be a recipe for
inifnite loops when loading a tree, hence it's better to throw away some
information than to follow potentially circular links.
First approach:
Setup:
The only information available to this routine is the $message_info hash
reference. This has the form:
$message_info = {
$sequence_number1 = {
'Message-ID' = $message_id,
Subject = $subject
...
}
...
}
The references item is a potentially multi-item string, and this routine
handles this. $reference_index and $no_references are sent into the
gather_References_items function to load their data structures. The latter,
no_references, is an afterthought.
sub create_References_index {
my $message_info = $_[0];
my $reference_index = {};
my $no_references = [];
gather_References_items($message_info, $reference_index, $no_references);
open OUT, dbm/referenc.ndi or die
Could not open file to write Reference index: $!;
foreach my $column_value (sort {uc $a cmp uc $b} keys %$reference_index) {
my $index_row = column_index_line($reference_index, $column_value);
print OUT $index_row\n;
}
close OUT or die
Could not close file after writing Reference index: $!;
create_no_reference_index($no_references);
}
sub gather_References_items {
my ($message_info, $reference_index, $no_references) = @_;
foreach my $msg_key (keys %$message_info) {
my $data_field = $message_info-{$msg_key}-{'References'};
if (!$data_field) {
push(@$no_references, $msg_key);
}
next if !$data_field or $data_field !~ /\S/;
$data_field =~ s/^(.*)$/$1/;
my @references = split /\s*/, $data_field;
foreach my $reference (@references) {
$reference_index-{$reference} = [] # create the anonymous target array
if !defined($reference_index-{$reference}); # if it doesn't exist
push @{$reference_index-{$reference}}, $msg_key;
}
}
}
The above very hadily preserves all sequence numbers of posts referencing any
given Message-ID. The following is more focused on getting just one [hoefully
the best, but more importantly cogent]. It is used in the active process of
loading a threaded mail index. As I said, the focus here is on keeping the
data lean and focused. That is why, instead of dealing with the whole list of
References items, I just pop until I find the one that seems best suited, then
get out, leaving extraneous data behind. I tried using different data
structures to hold different kinds of relationships, and ended up fighting with
the loading routine all day. This works:
sub establish_parentage {
my ($message_details, $subjects, $message_ids, $threads) = @_;
my $no_references = [];
my $candidate_roots = [sort {$a = $b} keys %$message_details];
my $parents = {};
my $temp = [];
while (my $candidate = shift @$candidate_roots) {
seek_explicit_references($candidate, $message_details,
$no_references, $parents, $message_ids);
}
$candidate_roots = $no_references;
$no_references = [];
while (my $candidate = shift @$candidate_roots) {
seek_implicit_references($candidate, $message_details, $no_references,
$parents,
$subjects);
}
return $parents;
}
sub seek_explicit_references {
my ($candidate, $message_details, $no_references, $parents, $message_ids) =
@_;
my $details = $message_details-{$candidate};
return if $details-{'parent'};
my $references = $details-{'References'};
$references =~ s/^\s+//;
$references =~ s/\s+$//;
my @references = split /\s+/, $references;
my $found = 0;
while (my $reference = pop @references) {
if (my $parent_sequence = $message_ids-{$reference}) {
$details-{'parent'} = $parent_sequence;
$parents-{$candidate} = $parent_sequence;
$found = 1;
last;
}
}
push @$no_references, $candidate if not $found;
}
sub seek_implicit_references {
my ($candidate, $message_details, $no_references, $parents, $subjects) = @_;
# We won't