Dear All
First of all thanks for the answers and the information's (I'll ding
into it) and let me trying to add comments on what I want to :
* My asci file mainly contains data (float and int) in a single column
* (it is not always the case but I can easily manage it - as well I
saw I can use 'spli' instruction if necessary)
* Comments/texts indicates the beginning of a bloc immediately
followed by the number of sub-blocs
* So I need to read/record all the values in order to build a matrix
before working on it (using Numpy & vectorization)
* The columns 2 and 3 have been added for further treatments
* The '0' values will be specifically treated afterward
Numpy won't be a problem I guess (I did some basic tests and I'm quite
confident) on how to proceed, but I'm really blocked on data records … I
trying to find a way to efficiently read and record data in a matrix:
* avoiding dynamic memory allocation (here using 'append' in python
meaning, not np),
* dealing with huge asci file: the latest file I get contains more
than 60 MILLION OF LINES
Please find in attachment an extract of the input format
('example_of_input'), and the matrix I'm trying to create and manage
with Numpy
Thanks again for your time
Paul
#######################################
##BEGIN _-> line number x in the original file_
42 _-> indicates the number of sub-blocs_
1 _-> number of the 1rst sub-bloc_
6 _-> gives how many value belong to the sub bloc_
12
47
2
46
3
51
….
13 _ -> another type of sub-bloc with 25 values_
25
15
88
21
42
22
76
19
89
0
18
80
23
38
24
73
20
81
0
90
0
41
0
39
0
77
…
42 _-> another type of sub-bloc with 2 values_
2
115
109
#######################################
THE MATRIX RESULT
1 0 0 6 12 47 2 46 3 51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 6 3 50 11 70 12 51 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 8 11 50 3 49 4 54 5 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 8 12 70 11 66 9 65 10 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 8 2 47 12 68 10 44 1 43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 8 5 56 6 58 7 61 11 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 8 11 61 7 60 8 63 9 66 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 19 12 47 2 46 3 51 0 13 97 14 92 15 96 0 72 0 48 0 52 0 0 0 0 0 0
9 0 0 19 13 97 14 92 15 96 0 16 86 17 82 18 85 0 95 0 91 0 90 0 0 0 0 0
0
10 0 0 19 3 50 11 70 12 51 0 15 89 19 94 13 96 0 52 0 71 0 72 0 0 0 0 0
0
11 0 0 19 15 89 19 94 13 96 0 18 81 20 84 16 85 0 90 0 77 0 95 0 0 0 0 0
0
12 0 0 25 3 49 4 54 5 57 11 50 0 15 88 21 42 22 76 19 89 0 52 0 53 0 55
0 71
13 0 0 25 15 88 21 42 22 76 19 89 0 18 80 23 38 24 73 20 81 0 90 0 41 0
39 0 77
14 0 0 25 11 66 9 65 10 68 12 70 0 19 78 25 99 26 98 13 94 0 71 0 67 0
69 0 72
….
#######################################
AN EXAMPLE OF THE CODE I STARTED TO WRITE
# -*- coding: utf-8 -*-
import time, sys, os, re
import itertools
import numpy as np
PATH = str(os.path.abspath(''))
input_file_name ='/example_of_input.txt'
## check if the file exists, then if it's empty or not
if (os.path.isfile(PATH + input_file_name)):
if (os.stat(PATH + input_file_name).st_size > 0):
## go through the file in order to find specific sentences
## specific blocks will be defined afterward
Block_position = []; j=0;
with open(PATH + input_file_name, "r") as data:
for line in data:
if '##BEGIN' in line:
Block_position.append(j)
j=j+1
## just to tests to get all the values
# i = 0
# data = np.zeros( (505), dtype=np.int )
# with open(PATH + input_file_name, "r") as f:
# for i in range (0,505):
# data[i] = int(f.read(Block_position[0]+1+i))
# print ("i = ", i)
# for line in itertools.islice(f,Block_position[0],516):
# data[i]=f.read(0+i)
# i=i+1
else:
print "The file %s is empty : post-processing cannot be
performed !!!\n" % input_file_name
else:
print "Error : the file %s does not exist: post-processing stops
!!!\n" % input_file_name
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion