Charlotte Hee wrote: > Hello All, > > I am having trouble splitting words from titles from a list of > research papers. I thought I could split the title into words like so: > > #!/usr/local/bin/perl > use locale; > > %forums = ( 1 => 'B0->K+K-Ks', > 2 => 'B+->K+KsKs Decays', > 3 => 'Measurement of the Total Width', > 4 => 'Asymmetries in B0->K0s pi0 Decays' > ); > > foreach $forum ( sort keys %forums ){ > my $title = $forums{$forum}; > foreach $w (split /[^\w-]+/, $title) { > next unless ($w =~ /^[A-Za-z]/); > $title =~ /\b\Q$w\E\b/; > print "Journal $forum indexed word = " . ucfirst($w) . "\n"; > } > } > > exit; > > But the results show that I'm losing some characters: > > Journal 1 indexed word = B0- # this should be B0->
No, because > matches the character class [^\w-] > Journal 1 indexed word = K # what happened to the '+'? Same as above. > Journal 1 indexed word = K-Ks > > Journal 2 indexed word = B # '+->' missing The '-' is there, but you're only printing tokens that start with a letter. > Journal 2 indexed word = K # '+' missing > Journal 2 indexed word = KsKs > Journal 2 indexed word = Decays > > Journal 3 indexed word = Measurement > Journal 3 indexed word = Of > Journal 3 indexed word = The > Journal 3 indexed word = Total > Journal 3 indexed word = Width > > Journal 4 indexed word = Asymmetries > Journal 4 indexed word = In > Journal 4 indexed word = B0- # should be 'B0->' > Journal 4 indexed word = K0s > Journal 4 indexed word = Pi0 > Journal 4 indexed word = Decays > > These are only example titles but the other titles have similar > characters in them as part of a "word". I tried adding the '-' and > '>' to my character class but that did not work. What am I doing > wrong here? It's not clear what you're defining as a "word". I'm wondering why you aren't just splitting on whitespace? foreach $w (split ' ', $title) { -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>