Re: [python-win32] Retrieve informations from NIST file.

Tim Roberts Thu, 30 Apr 2009 09:46:33 -0700

Khalid Moulfi wrote:
>
> thanks for your quick answer.
> Here is a sample of the first line of the NIST file :
>
> 1.001:00000002451.002:30001.003:1192041424344454647484941041141241341415151516151715181.004:NPS1.005:200810291.006:41.007:51/Live
> Scan1.008:51/Live 
> Scan1.009:0844251404U1.011:19.68501.012:19.68502.001:00000001882.002:02.003:30002.010:10050001902.019:200810292.029:02.054:Civilian2.083:01NA02NA03NA04NA05NA06NA07NA08NA09NA10NA2.233:ÈÏæä2.235:1011973400606
>
> but as the end of the line is not displayed, I send you a copy of the
> file with all the line.


That's because your file contains null bytes ('\x00').  The string you
display above shows everything up to the first null.


> The thing is even if I take the number of character from let's say
> 2.001 to the end of the line I do not get the real number of charatcer.

What do you mean by that?  Where did the numbers come from?  The file
contains one line of 471 bytes, including the newline.  Does that agree
with either of your sources?


> My goal is to modify this first line by adding new tag (with special
> character), suppress some of them, get the real number of length and
> after all this update to modify it in the original nst file.
>
> I'll try as you said to open it with rb parameters and see.

You will have to show me your code, along with what numbers you expect. 
The file you sent is 471 bytes long, and that's exactly what I read, in
both text and binary modes:

    C:\tmp>python
    Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit
    (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> x = open('sample_1005000190.nst')
    >>> y = open('sample_1005000190.nst', 'rb')
    >>> x1 = x.read()
    >>> y1 = y.read()
    >>> len(x1)
    471
    >>> len(y1)
    471
    >>> x1.find('2.001')
    245
    >>> x1[-2:]
    '\x00\n'
    >>> y1[-2:]
    '\x00\n'
    >>> x.seek(0,0)
    >>> x2 = x.readlines()
    >>> len(x2)
    1
    >>> len(x2[0])
    471
    >>>

The "2.001" is located at byte 245, so there should are 126 bytes from
there to the end of the line.  However, there are zero bytes (meaning
'\x00') in this file, which might be confusing you.

You have to know something about this data format to know how to modify
it.  It looks like the file consists of two major sections, separated by
0x1C characters.  The major sections are then divided into records
separated by 0x1D characters.  Some of the records have fields in them,
separated by 0x1E.  There are 38 bytes of what look like garbage after
the last field.  So, you could parse it into records like this:

    Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit
    (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> x = open('sample_1005000190.nst','rb').read()
    >>> sections = x.split('\x1c')
    >>> len(sections)
    3
    >>> [len(k) for k in sections]
    [244, 187, 38]
    >>> rec1 = sections[0].split('\x1d')
    >>> rec2 = sections[1].split('\x1d')
    >>> len(rec1)
    11
    >>> len(rec2)
    10
    >>> rec1
    ['1.001:0000000245', '1.002:3000',
    '1.003:1\x1f19\x1e2\x1f0\x1e4\x1f1\x1e4\x1f2\
    
x1e4\x1f3\x1e4\x1f4\x1e4\x1f5\x1e4\x1f6\x1e4\x1f7\x1e4\x1f8\x1e4\x1f9\x1e4\x1f10
    
\x1e4\x1f11\x1e4\x1f12\x1e4\x1f13\x1e4\x1f14\x1e15\x1f15\x1e15\x1f16\x1e15\x1f17
    \x1e15\x1f18', '1.004:NPS', '1.005:20081029', '1.006:4',
    '1.007:51/Live Scan', '
    1.008:51/Live Scan', '1.009:0844251404U', '1.011:19.6850',
    '1.012:19.6850']
    >>> rec2
    ['2.001:0000000188', '2.002:0', '2.003:3000', '2.010:1005000190',
    '2.019:2008102
    9', '2.029:0', '2.054:Civilian',
    '2.083:01\x1fNA\x1e02\x1fNA\x1e03\x1fNA\x1e04\x
    
1fNA\x1e05\x1fNA\x1e06\x1fNA\x1e07\x1fNA\x1e08\x1fNA\x1e09\x1fNA\x1e10\x1fNA',
    '
    2.233:\xc8\xcf\xe6\xe4', '2.235:1011973400606']
    >>>


Here, "sections" contains the three major sections.  "rec1" contains the
records from the first section.  If you wanted to add a "1.013" record
to the first section, you could say:
    rec1.append( "1.013:Cool Beans" )
and then rebuild the file by saying:
    newsections = ['\x1d'.join(rec1), '\x1d'.join(rec2), sections[2]]
    open('newfile.nst','wb').write ('\x1c'.join(newsections) )

But that assumes there's nothing in that garbage 3rd section that needs
to be changed.

It's just a matter of dividing the problem up into smaller problems
until the solution pops out.

-- 
Tim Roberts, t...@probo.com
Providenza & Boekelheide, Inc.

_______________________________________________
python-win32 mailing list
python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32

Re: [python-win32] Retrieve informations from NIST file.

Reply via email to