Python File Search

2010-01-11 Thread Eknath Venkataramani
correct.txt snippet:
1 2 1
1 3 3
1 5 21
1 7 19

union_output_TEMP.txt snippet:
1 2 1_NN
1 3 3_VBZ
1 3 5_VBZ
1 3 2_VBZ
1 5 21_VB
1 7 19_NN
1 9 14_VB

I need to get the output in categorized.txt as:
NN={1 7 19, 1 2 1}
VBZ={1 3 3}
VB={1 5 21}

in python.
Kindly help
1 2 1
1 3 3
1 5 21
1 7 19
1 10 13
1 11 14
1 12 12
1 13 11
1 13 9
1 13 8
1 13 10
1 14 7
1 15 6
1 16 22
2 1 4
2 2 1
2 2 2
2 3 3
3 1 5
3 1 4
3 2 1
3 3 2
3 4 3
4 1 1
4 2 2
4 3 5
4 5 3
4 6 4
4 7 6
5 2 1
5 5 5
5 8 2
5 10 3
5 13 16
5 14 18
5 15 15
5 16 7
5 17 8
5 18 11
5 18 12
5 18 9
5 18 13
5 19 23
6 1 3
6 1 4
6 2 1
6 3 2
7 1 1
7 2 2
7 3 3
7 4 12
7 6 10
7 7 11
7 8 8
7 9 5
7 9 4
7 10 6
7 13 13
8 2 1
8 3 12
8 4 11
8 5 10
8 6 7
8 7 8
8 8 9
8 13 5
8 14 14
9 1 4
9 2 1
9 3 3
9 3 2
9 4 3
9 5 3
10 1 11
10 2 9
10 3 1
10 4 2
10 5 3
10 6 4
10 7 5
10 8 6
10 9 7
10 10 13
10 12 14
10 16 20
10 17 21
10 19 16
10 20 18
10 20 17
10 22 18
10 23 26
11 1 5
11 2 1
11 3 3
12 2 1
12 3 26
12 4 25
12 5 10
12 8 7
12 8 9
12 10 12
12 14 20
12 15 14
12 16 15
12 17 16
12 18 18
12 19 19
12 20 27
1 2 1_NN
1 3 3_VBZ
1 3 5_VBZ
1 3 2_VBZ
1 5 21_VB
1 7 19_NN
1 9 14_VB
1 10 17_JJR
1 10 18_JJR
1 10 15_JJR
1 10 13_JJR
1 10 21_JJR
1 10 14_JJR
1 11 14_NNS
1 12 12_IN
1 13 11_NNS
1 13 14_NNS
1 13 9_NNS
1 13 8_NNS
1 13 10_NNS
1 13 6_NNS
1 14 7_IN
1 15 6_NNS
1 16 22_.
2 1 4_VBG
2 2 5_JJ
2 2 1_JJ
2 2 2_JJ
2 2 4_JJ
2 3 3_NN
3 1 5_NN
3 1 3_NN
3 1 6_NN
3 1 4_NN
3 2 3_NN
3 2 1_NN
3 3 2_CC
3 4 3_NNS
4 1 1_NN
4 2 2_NN
4 3 5_VBZ
4 5 3_JJ
4 6 4_NN
4 7 6_.
5 2 1_NN
5 3 21_VBZ
5 4 18_VBG
5 5 5_IN
5 7 11_JJ
5 8 2_NN
5 10 3_NNS
5 12 18_VB
5 13 16_NN
5 14 21_NNS
5 14 18_NNS
5 14 19_NNS
5 14 20_NNS
5 15 15_IN
5 16 7_NNS
5 17 8_CC
5 18 11_NNS
5 18 12_NNS
5 18 9_NNS
5 18 6_NNS
5 18 13_NNS
5 18 22_NNS
5 18 8_NNS
5 19 23_.
6 1 3_VBG
6 1 4_VBG
6 2 1_NN
6 3 2_NN
7 1 1_JJ
7 2 2_NN
7 3 3_NN
7 4 12_VBZ
7 5 11_DT
7 6 11_JJ
7 6 7_JJ
7 6 10_JJ
7 6 9_JJ
7 7 11_NN
7 8 8_IN
7 9 5_DT
7 9 4_DT
7 10 10_JJ
7 10 6_JJ
7 11 10_NN
7 12 10_NN
7 13 13_.
8 2 1_NN
8 3 12_VBZ
8 3 3_VBZ
8 3 11_VBZ
8 3 4_VBZ
8 3 2_VBZ
8 4 11_TO
8 5 10_VB
8 6 7_JJ
8 7 8_NN
8 7 9_NN
8 8 9_NN
8 11 5_NN
8 13 5_NNS
8 14 14_.
9 1 3_VBG
9 1 1_VBG
9 1 4_VBG
9 2 1_JJ
9 3 3_NNS
9 3 2_NNS
9 4 3_IN
9 5 5_NN
9 5 6_NN
9 5 3_NN
10 1 11_VBG
10 1 1_VBG
10 2 9_IN
10 3 1_NN
10 4 2_,
10 5 3_NN
10 6 5_NNS
10 6 4_NNS
10 6 9_NNS
10 7 5_CC
10 8 6_JJ
10 9 7_NNS
10 10 13_,
10 12 14_NN
10 13 15_VBZ
10 14 21_VBN
10 14 12_VBN
10 15 21_RP
10 16 20_NN
10 16 23_NN
10 16 22_NN
10 16 21_NN
10 16 7_NN
10 17 21_NNS
10 17 10_NNS
10 18 24_IN
10 19 16_JJ
10 20 18_NNS
10 20 17_NNS
10 21 8_IN
10 22 18_NN
10 23 26_.
11 1 2_VBG
11 1 3_VBG
11 1 5_VBG
11 2 1_JJ
11 3 3_NN
11 3 6_NN
12 2 1_NN
12 3 26_MD
12 4 25_VB
12 5 10_VB
12 6 9_NN
12 7 7_NN
12 7 15_NN
12 8 10_NNS
12 8 7_NNS
12 8 2_NNS
12 8 6_NNS
12 8 4_NNS
12 8 9_NNS
12 10 12_CC
12 11 18_VB
12 12 19_PRP$
12 13 19_NN
12 14 20_IN
12 15 14_JJ
12 15 13_JJ
12 16 15_CC
12 17 16_JJ
12 18 18_NN
12 18 26_NN
12 18 24_NN
12 18 19_NN
12 18 22_NN
12 18 11_NN
12 18 25_NN
12 18 21_NN
12 19 19_NNS

which data structure should I use?

2010-01-14 Thread Eknath Venkataramani
I have a txt file in the following format:
"confident" => {
  count => 4,
  trans => {
 "ashahvasahta" => 0.74918568,
"atahmavaishahvaasa" => 0.09095465,
"pahraaram\.nbha" => 0.06990729,
 "mailatae" => 0.02856427,
   "utanai" => 0.01929341,
 "anaa" => 0.01578552,
 "uthaanae" => 0.01403157,
 "jaitanae" => 0.01227762,
"consumers" => {
  count => 4,
  trans => {
"upabhaokahtaa" => 0.75144362,
"upabhaokahtaaom\.n" => 0.12980166,
"sauda\�\�\�dha" => 0.11875471,
"a" => {
  count => 1164,
  trans => {
  "eka" => 0.14900491,
   "kaisai" => 0.08834675,
 "haai" => 0.06774697,
 "kaoi" => 0.05394308,
  "kai" => 0.04981982,
 "\(none\)" => 0.04400085,
  "kaa" => 0.03726579,
  "kae" => 0.03446450,

and I need to extract "confident" , "ashahvasahta" from the first
record, "consumers",  "upabhaokahtaa" from the second record...
i.e. "word in english" and the "first word in the probable-translations"

Thanks is advance

pyparsing wrong output

2010-02-12 Thread Eknath Venkataramani
I am trying to write a parser in pyparsing.
Help Me. is the code and this is input
file: .
I get output as:

Eknath Venkataramani

extracting unicode text from pdfs

2010-05-24 Thread Eknath Venkataramani
I have around 45 pdfs to convert into raw text containing text in _HINDI_ .
When I use the xpdf package, the generated text is very weird, so I'd like
to write a program which would convert the pdf text into Unicode text as it

The fonts used in the pdfs:
name   type  emb sub uni object
 - --- --- --- -
APKAPP+Usha-Bold Type 1C   yes yes yes 72  0
APKBBB+Agenda-Light  Type 1C   yes yes yes 77  0
APKBGF+Usha  Type 1C   yes yes yes 41  0
APKBKJ+Agenda-Medium Type 1C   yes yes yes 46  0
APKBON+Agenda-Bold   Type 1C   yes yes yes 49  0

For eg. in the pdf: आदमी मुसाफिर है
  when I use pdftotext, I get some very weird symbols: '...
 while i'd like 'आदमी मुसाफिर है' to be the output

Eknath Venkataramani

Re: Replace in large text file ?

2010-06-06 Thread Eknath Venkataramani
On Sat, Jun 5, 2010 at 1:23 PM, Steve  wrote:

> Remove all comma's
> Replace all @ with comma's
> Save as a new file.

Why don't you use 'sed'. It'd be way faster
Eknath Venkataramani

Re: Multiline regex

2010-07-21 Thread Eknath Venkataramani
On Wed, Jul 21, 2010 at 8:12 PM, Brandon Harris

> I'm trying to read in and parse an ascii type file that contains
> information that can span several lines.
Do you have to use only regex? If not, I'd certainly suggest 'pyparsing'.
It's a  pleasure to use and very easy on the eye too, if you know what I

>  I'm wanting to grab the information out in chunks, so

Eknath Venkataramani

Re: I need a starter ptr writing python embedded in html.

2010-08-07 Thread Eknath Venkataramani

On Sun, Aug 8, 2010 at 8:12 AM, Steven W. Orr  wrote:

> I'm ok in python but I haven't done too much with web pages. I have a web
> page
> that is hand written in html that has about 1000 entries in a table and I
> want
> to convert the table from entries like this
>   Some Date String 
>   SomeTag 
> A Title 
>  Click
>   Some Comment 
> to
>   SomePythonCall('Some Date String',
>'Some Comment')
> Can someone tell me what I should look at to do this? Is mod_python where I
> should start or are there things that are better?
> --
> Time flies like the wind. Fruit flies like a banana. Stranger things have
>  .0.
> happened but none stranger than this. Does your driver's license say Organ
> ..0
> Donor?Black holes are where God divided by zero. Listen to me! We are all-
> 000
> individuals! What if this weren't a hypothetical question?
> steveo at
> --

Eknath Venkataramani

Re: How to swallow traceback message

2010-08-11 Thread Eknath Venkataramani
See Exception Handling <>

On Wed, Aug 11, 2010 at 11:09 AM, Back9  wrote:

> Hi,
> I run my py app to display a file's contents, and it is normally very
> long.
> So I use it like below:
> python input_file | more
> to see them step by step.
> But when I try to exit it, normally I use Ctrl+ C key to quit it.
> Problem is every time I do like it, it shows Traceback message and it
> makes my app not professional.
> How do I handle it gracefully.
> --

Eknath Venkataramani