On May 3, 4:08 am, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > ----Messaggio originale---- > Da: [EMAIL PROTECTED] > Data: 3-mag-2007 > 10.02 > A: <[EMAIL PROTECTED]> > Ogg: problem with meteo datas > > Hello, > I'm Peter and I'm new in python codying and I'm using parsying > to > extract data from one meteo Arpege file. > This file is long file and > it's composed by word and number arguments like this: > > GRILLE EURAT5 > Coin Nord-Ouest : 46.50/ 0.50 Coin Sud-Est : 44.50/ 2.50 > MODELE PA > PARAMETRE P > NIVEAU MER 0 ECHEANCE 0.0 DATE 20020304000000 NB_POINTS > 25 > 1020.91 1020.87 1020.91 1021.05 1021.13 > > 1020.07 1020.27 1020.49 1020.91 1021.15 > 1019.37 > 1019.65 1019.79 1020.53 1020.77 > 1018.73 1018.89 > 1019.19 1019.83 1020.81 > 1018.05 1018.19 1018.75 > 1019.55 1020.27 > NIVEAU MER 0 ECHEANCE 3.0 DATE 20020304000000 > NB_POINTS 25 > 1019.80 1019.78 1019.92 1020.18 1020.34 > 1018.94 1019.24 1019.54 1020.08 1020.32 > 1018.24 > 1018.64 1018.94 1019.84 1019.98 > 1017.48 1017.88 > 1018.28 1018.98 1019.98 > 1016.62 1017.08 1017.66 > 1018.26 1018.34 > NIVEAU MER 0 ECHEANCE 6.0 DATE 20020304000000 > NB_POINTS 25 > 1019.37 1019.39 1019.57 ........ > ........
Peter - Your first attempt at pyparsing is a good step - just get something working! You've got a first pattern working that detects and extracts all decimal numbers. (I think you are the first one to model a decimal number as a delimited list of integers with "." as the delimiter.) The next step is to start looking for some higher-level text groups or patterns. Your data is well structured as an n-level hierarchy, that looks to me like: - model+parameter - level - nb_points - level - nb_points - level - nb_points - model+parameter - level - nb_points - level - nb_points ... You can build your pyparsing grammar from the ground up, first to parse individual terminal expressions (such as decimal numbers which you already have), and then buld up to more and more complex structures within your data. The first thing to change about your approach is to start looking at this data as a whole, instead of line by line. Instead of extracting this first line of 5 point values: 1020.91 1020.87 1020.91 1021.05 1021.13 look at this as one piece of a larger structure, a data set for a given niveau: NIVEAU MER 0 ECHEANCE 0.0 DATE 20020304000000 NB_POINTS 25 1020.91 1020.87 1020.91 1021.05 1021.13 1020.07 1020.27 1020.49 1020.91 1021.15 1019.37 1019.65 1019.79 1020.53 1020.77 1018.73 1018.89 1019.19 1019.83 1020.81 1018.05 1018.19 1018.75 1019.55 1020.27 So let's create a parser for this structure that is the next step up in the data hierarchy. NIVEAU, ECHEANCE, DATE, and NB_POINTS are helpful labels for marking the data, but not really important to return in the parsed results. So I will start by creating definitions for these labels which will parse them, but leave out (suppress) them from the returned data: NIVEAU, ECHEANCE, DATE, NB_POINTS = \ map(Suppress,"NIVEAU ECHEANCE DATE NB_POINTS" .split()) You stated that there are several options for what a niveau identifier can look like, so this should be its own expression: niveau_ref = Literal("MER 0") | Literal("SOL 0") | \ Combine(Literal("HAUTEUR ") + eurodec) (I defined eurodec as you defined dec, but with a comma delimiter.) I'll also define a dateString as a Word(nums) of exactly 14 digits, but you can come back to this later and refine this as you like (build in parse-time conversion for example). dateString = Word(nums,exact=14) And then you can create an expression for a full niveau's-worth of data: niveau = NIVEAU + niveau_ref + ECHEANCE + dec + DATE + dateString + NB_POINTS + countedArray(dec) Notice that we can use the pyparsing built-in countedArray to capture all of the data point values, since NB_POINTS gives the number of points to follow, and these are followed immediately by the points themselves. Pyparsing will convert all of these into a nice n-element list for us. You astutely requested that these values should be accessible like values in a dict, so we do this in pyparsing by adding results names: niveau = NIVEAU + niveau_ref.setResultsName("niveau") + \ ECHEANCE + dec.setResultsName("echeance") + \ DATE + dateString.setResultsName("date") + \ NB_POINTS + countedArray(dec).setResultsName("nb_points") Now you should be able to search through your data file, extracting all of the niveaux (?) and their related data: f=file("arqal-Arpege.00", "r") fdata = f.read() # read the entire file, instead of going line-by- line for n in niveau.searchString(fdata): print n.niveau print n.dump() pointValues = map(float,n.nb_points[0]) print "- NB_POINTS mean:", sum(pointValues) / len(pointValues) print (I also added some examples of extracting data using the results names. You can also use dict-style notation, n["niveau"], if you prefer.) Gives this output (I've truncated with '...' for the sake of Usenet posting, but the actual program gives the full lists of values): MER 0 ['MER 0', '0.0', '20020304000000', ['1020.91', '1020.87', ... - date: 20020304000000 - echeance: 0.0 - nb_points: [['1020.91', '1020.87', '1020.91', '1021.05', ... - niveau: MER 0 - NB_POINTS mean: 1020.0052 MER 0 ['MER 0', '3.0', '20020304000000', ['1019.80', '1019.78', ... - date: 20020304000000 - echeance: 3.0 - nb_points: [['1019.80', '1019.78', '1019.92', '1020.18', ... - niveau: MER 0 - NB_POINTS mean: 1018.9736 MER 0 ['MER 0', '48.0', '20020304000000', ['1017.84', '1017.46', ... - date: 20020304000000 - echeance: 48.0 - nb_points: [['1017.84', '1017.46', '1017.14', '1016.86', ... - niveau: MER 0 - NB_POINTS mean: 1015.9168 HAUTEUR 2 ['HAUTEUR 2', '0.0', '20020304000000', ['1.34', '1.51', '1.40', ... - date: 20020304000000 - echeance: 0.0 - nb_points: [['1.34', '1.51', '1.40', '0.56', '-0.36', '1.73', ... - niveau: HAUTEUR 2 - NB_POINTS mean: 0.9028 HAUTEUR 2,4 ['HAUTEUR 2,4', '3.0', '20020304000000', ['1.34', '1.51', '1.40', ... - date: 20020304000000 - echeance: 3.0 - nb_points: [['1.34', '1.51', '1.40', '0.56', '-0.36', '1.73', ... - niveau: HAUTEUR 2,4 - NB_POINTS mean: 0.9028 Now I'll let you take this the next step: compose the expression for the model+parameter hierarchy level (hint: the body of each model +parameter value will be an expression of OneOrMore( Group( niveau ) ) - be sure to give this a results name, too). -- Paul -- http://mail.python.org/mailman/listinfo/python-list