Sorting hash values

2006-04-05 Thread Baskaran Sankaran
Hi group,

I am trying to process a hash (words as keys and frequency counts in an
array as values). Instead of just sorting it by values, I want to sort
it by the probabilities and I tried this...

foreach $rule (sort {\compute_probability($b) =
\compute_probability($a)} keys %rules_new) {
$freq1 = $rules_new{$rule}-[0]; $freq1 = 0 if(!defined $freq1);
$freq2 = $rules_new{$rule}-[1]; $freq2 = 0 if(!defined $freq2);

$prob1 = $freq1 / ($freq1 + $freq2);
$prob2 = $freq2 / ($freq1 + $freq2);

print O $rule\t$freq1\t$freq2\t$prob1\t$prob2\n;
}


sub compute_probability() {
my $key = shift;
my $prob = $rules_new{$key}-[0] / ($rules_new{$key}-[0] +
$rules_new{$key}-[1]);
return $prob;
}


And it doesn't work. Any help is appreciated.

Thanks,
Baskaran

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




RE: store Array in hash ?

2006-03-30 Thread Baskaran Sankaran
You can try this.
 
$test-{$setup}-{'data'} = [$data[0], $data[1]];
 
As you said reference is not going to work, if your array value changes at run 
time. You can also access specific elements of the array individually, whenever 
required.
 
Baskaran



From: Michael Gale [mailto:[EMAIL PROTECTED]
Sent: Thu 30/03/2006 13:31
Cc: beginners@perl.org
Subject: Re: store Array in hash ?



Hey,

I think I am supposed to use a reference here ? If so I can't because
the data array keeps over written and reused again and again.

I guess I will come up with a different solution.

Michael


Michael Gale wrote:
 Hello,

 I have setup a hash like the following:

 my $test;

 $test-{$setup}-{'opt'} = OK;

 It works fine, now I want to save an array to the hash:

 my @data;

 push(@data,test);

 $test-{$setup}-{'data'} = @data;

 Now how do I use it in a for loop later in the script, I tried:

 for(my $c=0; $c  $test-{$setup}-{'data'}; $c++) {
 print $test-{$setup}-{'data'}[$c];
 }

 But that just returns:
 Can't use string (2) as an ARRAY ref while strict refs in use at
 ./imap-watch.pl line 380.

 Michael


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response





--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Tree:Nary and Unicode

2006-03-20 Thread Baskaran Sankaran
Hi group,

 

My Unicode blues continues… One more question on handling Unicode.

 

Does Tree::Nary module works with Unicode data? I am trying to store some 
Unicode strings (UTF-8 to be precise) as Nary Tree.

 

When I retrieve the value of the nodes from the tree, it produces the output as 
boxes, even though I am using binmode for activating the utf-8 layer and am 
reasonable sure that I am not making any mistakes. I should also mention that, 
the same code works well with normal data (such as English strings) Anyway, 
here is the code:

 

Thanks in advance.

 

-B

 

---

 

The file ‘temp1.txt’ will have this sequence: पढ़ते है . राम पुस्तक एक

 

 

use Tree::Nary;

use warnings;

 

open(O, children.txt);

binmode (O, :utf8);

 

open(I, utf8, temp1.txt);

while(I) {

chomp;

@words = split(/\s+/, $_);

}

close(I);

 

$root = Tree::Nary-new($words[0]);

$node = Tree::Nary-new();

 

$node101 = $node-insert_data($root, 1, $words[1]);

$node102 = $node-append_data($root, $words[2]);

 

$node103 = $node-prepend_data($root, $words[3]);

$node231 = $node-insert_data($node103, 1, $words[4]);

$node232 = $node-prepend_data($node103, $words[5]);

 

undef $elements;

$node-traverse($root, $Tree::Nary::PRE_ORDER, $Tree::Nary::TRAVERSE_ALL, -1, 
\building_node);

print O Elements of the Tree : $elements\n;

 

$total_nodes = $node-n_nodes($root, $Tree::Nary::TRAVERSE_ALL);

print O The tree has $total_nodes nodes in total\n;

 

close(O);

 

sub building_node() {

my $node = shift;

my $child = $node-{data};

if(!defined $elements) { $elements .= $child; }

else { $elements .=  .$child; }

 

return ($Tree::Nary::FALSE);

}



RE: Tree:Nary and Unicode

2006-03-20 Thread Baskaran Sankaran


-Original Message-
From: Mr. Shawn H. Corey [mailto:[EMAIL PROTECTED] 
Sent: 20 March 2006 16:25
To: beginners@perl.org
Subject: Re: Tree:Nary and Unicode

Baskaran Sankaran wrote:
 use Tree::Nary;
 
 use warnings;
 
  
 
 open(O, children.txt);
 
 binmode (O, :utf8);
 
  
 
 open(I, utf8, temp1.txt);

Shouldn't this be:

open( I, ':utf8', 'temp1.txt' ) or die cannot open temp1.txt: $!;


 This doesn't make a difference, I suppose. Also, I just checked it to
be be sure and I didn't see any difference.


-- 

Just my 0.0002 million dollars worth,
--- Shawn

For the things we have to learn before we can do them,
we learn by doing them.
   Aristotle

The man who sets out to carry a cat by its tail learns something that
will always be useful and which will never grow dim or doubtful.
   Mark Twain

Believe in the Divine, but paddle away from the rocks.
   Hindu Proverb

* Perl tutorials at http://perlmonks.org/?node=Tutorials
* A searchable perldoc is at http://perldoc.perl.org/


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Regular expression - Global substitution

2006-03-13 Thread Baskaran Sankaran
Hi Group,

 

Am working with Unicode (UTF8 coded) stuff and facing problem with regular 
expression. 

 

  s/(\p{HinNumerals})\s+($tokenize_string)+\s+(\p{HinNumerals})/$1$2$3/g;

 

and, my HinNumerals is defined as,

 

sub HinNumerals {

  return END;

0966\t096F

END

}

 

$tokenize_string is a set of punctuation marks glued together in a string with 
‘|’ between them for OR. Here it goes:

$tokenize_string = 
\x{0021}|\x{0022}|\x{0023}|\x{0025}|\x{0026}|\x{0027}|\x{0028}|\x{0029}|\x{002A}|\x{002B}|\x{002C}|\x{002D}|\x{002E}|\x{002F}|\x{003A}|\x{003B}|\x{003C}|\x{003D}|\x{003E}|\x{003F}|\x{0040}|\x{005B}|\x{005C}|\x{005D}|\x{005E}|\x{005F}|\x{007B}|\x{007C}|\x{007D}|\x{007E}|\x{0964}|\x{0965};

 

Initially $_ consists of इस बिन्दु पर लाभ बहुत ही कम , रु .२ , ००० – ४ , ००० रु 
प्रति कार हैं । and in the substitute command I am trying to remove the blank 
spaces between the hindi numerals and number separators. So, I want the 
substring २ , ००० – ४ , ००० to become २,०००–४,०००. The HinNumerals defines the 
range of the Hindi numerals and my regular expression looks for punctuation 
marks sandwiched (with spaces on both sides) between Hindi numerals and removes 
the spaces and the switch ‘g’ ensures that it is applied in as many places as 
possible in $_.

 

But, the result is weird and I get बिन्दु पर लाभ बहुत ही कम , रु .२,०००–४ , ००० 
रु प्रति कार हैं । The problem is that the regexp applies correctly for the 
first two instances: viz. २ , ० and ० – ४ but doesn’t work for the last 
instance, which is ४ , ०.  I am puzzled why this happens.

 

Any clue/ solution will be useful.

 

Baskaran



RE: FW: Reading a Unicode text file

2006-03-08 Thread Baskaran Sankaran
Hi group,

 

Finally I made Perl to understand my file (the context is the mail quoted 
below). The problems is with how the file is saved in Unicode. I saved the file 
this time as utf8 (in notepad) and the same code works now (earlier I saved as 
Unicode text file) including regular expression features. But, I think Perl 
doesn’t allow us to I/O with Unicode as encoding in the handle (instead of 
utf8).

 

Thanks to those who answered earlier.

Baskaran

 

 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Phoenix
Sent: 20 February 2006 11:56
To: Baskaran Sankaran
Cc: beginners@perl.org
Subject: Re: FW: Reading a Unicode text file

 

On 2/17/06, Baskaran Sankaran [EMAIL PROTECTED] wrote:

 

 File: Sample_Hin.txt

 

 दूसरे राज्य पुनर्गठन आयोग के गठन का यही सही वक्त है।

 

 The sample files were created in Windows in Unicode (both English  Hindi)

 and I am able to open then in notepad and wordpad. But, the output as you

 see is garbage and somehow it misses the utf8. This apart, a blank space is

 added for every character in both English and Hindi.

 

I've done a little experimenting, and I think you're right and Perl is

wrong here. At least, Perl seemingly disagrees with some common tools

about what a utf8 file is. I confess that I don't know enough about

utf8 to be certain.

 

If you don't get any better responses soon, you could use perlbug to

file a bug report. It is best if you can include a (small) utf8 file,

such as the first few lines of your Sample_Hin.txt file. But it's

important that the exact file contents be part of the bug report, not

just the text. One way would be if you can include a URL where the

files could be downloaded. But if the files are small enough, you can

convert them to a textual form (such as a hex dump) and include them

with your bug report.

 

Good luck with it!

 

--Tom Phoenix

Stonehenge Perl Training



FW: Reading a Unicode text file

2006-02-17 Thread Baskaran Sankaran
Hi group,

 

Opening a file for output is fine. I've also moved the binmode immediately 
after open, outside the loop. But, the error still shows. Now, here are the 
sample files and my command would be:

 

 perl number_sent.pl Sample_Eng.txt Sample_Hin.txt Sample_out.txt

 

File: Sample_Eng.txt

The time is ripe for a second SRC.

Most of this new-found resistance has begun after September 7 when the equity 
sale in oil companies was put off for three months.

The 20-year-old has just signed Jaal-The Trap with the balding Sunny Deol.

The elderly climber was elated on becoming the oldest man to conquer the peak.

Which means control over the press through the new owner-just about covered the 
principal.

Kashmir, as with many Pakistani soldiers and policy makers, has been his blind 
spot.

Don't knock it.

Anomaly has been noticed in supplying of Natural Gas to different industries in 
the North-East.

A former biochemist, figured out the art of running a successful business after 
a series of bad experiences with Just Desserts and Under the Over in Mumbai and 
Kottiyum in Bangalore.

The police then ruled out facilitating any meeting with Kalra since he is on 
bail.

 

File: Sample_Hin.txt

दूसरे राज्य पुनर्गठन आयोग के गठन का यही सही वक्त है।

अभी सामने आया यह ज्यादातर विरोध ७ सितंबर के बाद शुरू हुआ, जब तेल कंपनियों की 
इक्विटी बिक्री तीन महीने के लिए स्थगित कर दी गई।

कुल २० साल की इस बाला ने हाल ही में सनी देओल के साथ जाल-द ट्रैप फिल्म साइन की 
है।

वे इस शिखर पर पहुँचने वाले सबसे बुजुर्ग पर्वतारोही हैं।

जिसका मतलब यह है कि नए मालिक के जरिए प्रेस पर स्वेन का ही नियंत्रण रहेगा।

जहां तक कश्मीर का सवाल है, पाकिस्तान के दूसरे सैनिक तानाशाहों और नीति नियंताओं 
की तरह यह उनके लिए उलझन भरा मामला है।

इसे खारिज मत कीजिए।

पूर्वोत्तर में विभिन्न उद्योगों को प्राकृतिक गैस की आपूर्ति में असमानता देखी गई 
है।

पूर्व बायोकेमिस्ट मुंबई में जस्ट डेजर्ट्स तथा अंडर द ओवर और बंगलूर में कोट्टियम 
के बुरे अनुभवों के बाद व्यवसाय चलाने का गुर सीख गए हैं।

पुलिस ने कालरा से बटोही की मुलाकात भी नहीं करवाई क्योंकि वह जमानत पर है।

 

File: Sample_out.txt

1. ÿþT h e   t i m e   i s   r i p e   f o r   a   s e c o n d   S R C . 

 

ÿþ  B  8  0  G0  
 M / * A  (   0  M  
  (/   K
    G   (    /   
  9@   89  @   5   
  M $9H  d  

 

 

 

2.  M o s t   o f   t h i s   n e w - f o u n d   r e s i s t a n c e   h a s   
b e g u n   a f t e r   S e p t e m b e r   7   w h e n   t h e   e q u i t y   
s a l e   i n   o i l   c o m p a n i e s   w a s   p u t   o f f   f o r   t h 
r e e   m o n t h s . 

 

 -   @   8  .   (   G  
 /   / 9   M  
   /  $  05  
  ?  0  K  ' m 8?  
$  [1],   0   G,   
6A  0  B9
A   ,    , $G  2   
    [1] *   (   ?  /   K  
[1]      @    M 5  ?  
­@   , ?   M 0 
 @   $@ (  . 9  @   
  (   G   G2? 
8M %  ?$   0   
 @   d  

 

And it goes on for 10 sentences. The sample files were created in Windows in 
Unicode (both English  Hindi) and I am able to open then in notepad and 
wordpad. But, the output as you see is garbage and somehow it misses the utf8. 
This apart, a blank space is added for every character in both English and 
Hindi.

 

Baskaran

 



RE: Reading a Unicode text file

2006-02-16 Thread Baskaran Sankaran
Here is the new code:

use strict;
use warnings;

my $file1 = $ARGV[0];
my $file2 = $ARGV[1];

open(F1, :utf8, $file1) || die(Can not find file $file1\n);
open(F2, :utf8, $file2) || die(Can not find file $file2\n);
open(O, :utf8, $ARGV[2]);

my $line_eng;
my $line_hin;
my $line_no = 1;

while(F1) {
chomp;
$line_eng = $_;

$line_hin = F2;
chomp($line_hin);

binmode(O, :utf8);
print O $line_no. $line_eng\n$line_hin\n\n\n;
$line_no++;
}

close(F1);
close(F2);
close(O);

And, here I am trying to read two txt files (having equal no. of lines)
and print the contents in a single file given in the command prompt. The
files are in Unicode. I am using ActivePerl v5.8.7 in windows.

Thanks for the help.
Baskaran



-Original Message-
From: JupiterHost.Net [mailto:[EMAIL PROTECTED] 
Sent: 09 February 2006 21:04
To: beginners@perl.org
Subject: Re: Reading a Unicode text file



Baskaran Sankaran wrote:

 Thanks for that but still I do face problem. I did that and it raises
a 
 warning:
 
  
 
 utf8 \xFF does not map to Unicode at second.pl line 8, 
 $lang_sample_fh line 1.

Excellent, so now we have soem info to start with :)


 I have 10 sentences each in different line in the input file (input
file 
 doesn't contain any blank lines between sentences). The output file
gets 
 generated, but only alternate lines are displayed properly. Lines 2,
4, 
 6, ... are displayed as junk boxes in Unicode.

ok:

send the new code so we can see what line 8 is and please send a 
representation of the data in the file, the results and what you
excpect.
(IE imagine we can't pull what you expect to happen and what is 
happening from thin air, all I have is an abscure warning, something 
about every other line being goofed, and something about Unicode *'s)

Thanks ;)

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Practical Unicode handling in Perl

2006-02-16 Thread Baskaran Sankaran
Hi group,

 

Is there a source for practical Unicode handling in Perl with example
programs (sort of cook book will be useful)?

Man pages like perlunicode, perluniintro are fine but I feel they are
far less than practical. Any pointers would be great.

 

Thanks,

Baskaran



Reading a Unicode text file

2006-02-09 Thread Baskaran Sankaran
Hi,

 

I am trying to read a Unicode text file and the code fragment is as
below:

 

$infile = sample_hindi.txt;

$outfile = out.txt;

 

open(F, :utf8, $infile);

while(F) {

  chomp;

  binmode ( STDOUT, :utf8 );

  print $_;

}

close(F);

 

The output correctly displays only the alternate lines, which the other
lines appear as boxes. However, when I try to print an * (asterisk) at
the end of each line by changing the 'print' as:

 

print $_\x{002A};

 

it displays all the lines correctly along with *. Is the problem due to
the newline char at the end of every line?

 

Thanks in advance for the help.

 

Baskaran