Re: Formatting Results so that They Can be Nicely Imported into a Spreadsheet.
En Sun, 05 Aug 2007 06:06:54 -0300, SMERSH009 <[EMAIL PROTECTED]> escribió: > The only question that remains for me--and this is just for my > knowledge-- what does the "if i" mean in this code snippet? > f = [i.split() for i in d if i] > How is it helpful to leave a dangling "if i"? Why not just f = > [i.split() for i in d]? `if i` means `if i is considered true`. In this case we are talking about strings: "" is false and all other strings are true. So this is a way to say `if i is not the empty string`, effectively filtering out empty lines. Perhaps using more meaningful names for variables makes the code more clear: exploded_lines = [line.split() for line in lines if line] -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: Formatting Results so that They Can be Nicely Imported into a Spreadsheet.
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: ... > Even with the "if i" included, we end up with an > empty list at the start. This because the first "blank" > line wasn't blank, it was a space, so it passes the > "if i" test. ...and you can fix that by changing the test to [... if i.split()]. Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: Formatting Results so that They Can be Nicely Imported into a Spreadsheet.
[EMAIL PROTECTED] wrote: > In a message dated 8/4/2007 11:50:05 PM Central Daylight Time, > [EMAIL PROTECTED] writes: > > [EMAIL PROTECTED] wrote: > > On Aug 4, 6:35?pm, SMERSH009 <[EMAIL PROTECTED]> wrote: > >> Hi All. > >> Let's say I have some badly formatted text called doc: > >> > >> doc= > >> """ > >> friendid > >> Female > >> > >> 23 years old > >> > >> Los Gatos > >> > >> United States > >> friendid > >> Male > >> > >> 24 years old > >> > >> San Francisco, California > >> > >> United States > >> """ > >> > >> How would I get these results to be displayed in a format similar to: > >> friendid;Female;23 years old;Los Gatos;United States > >> friendid;Male; 24 years old;San Francisco, California;United States > >> > >> The latter is a lot easier to organize and can be quickly imported > >> into Excel's column format. > >> > >> Thanks Much, > >> Sam > > > > d = doc.split('\n') > > > > f = [i.split() for i in d if i] > > > > g = [' '.join(i) for i in f] > > > > rec = [] > > temprec = [] > > for i in g: > > if i: > > if i == 'friendid': > > rec.append(temprec) > > temprec = [i] > > else: > > temprec.append(i) > > rec.append(temprec) > > > > output = [';'.join(i) for i in rec if i] > > > > for i in output: print i > > > > ##friendid;Female;23 years old;Los Gatos;United States > > ##friendid;Male;24 years old;San Francisco, California;United > > States > > > > Also : > > docList = [ i.strip() for i in doc.split('\n') if i.strip()] > > lines = [i for i in xrange(len(docList)) if docList[i] == > 'friendid']+[len(docList)] > > docOut = '' > for k in [docList[lines[j]:lines[j+1]] for j in xrange(len(lines)-1)]: > docOut += '\n' + ';'.join(k) > > docOut = docOut[1:]# Get rid of initial '\n' > > Aren't you making an unwarranted assumption here? > That doc ALWAYS starts with EXACTLY one blank line? I guess you are referring to : docList = [ i.strip() for i in doc.split('\n') if i.strip()] Actually no, "doc.split('\n')" will split it a list member per line, whether there are blank lines or not. Then "i.strip()" will take care of redundant space, and "if i.strip()" will take care of blank lines. Try it in your shell, play with "doc" and you'll see (I did try it without the first blank line, and with a lot of blank lines and it's ok). > > That's why I didn't use a list comprehension in that > one section, to cover the possibility of any number > (including none) of blank lines. It works, didn't make a thorough testing but the cases you suggest were tested. > > By blindly coding [1:] you run the risk of data loss, > and that's a poor example for the OP. It is not "blindly". I guarantee the first byte will ALLWAYS be a '\n', that's because I'm putting it there with "docOut += '\n' + ';'.join(k)" (check the '\n' added at the beginning of each cycle), and I need to strip the beginning '\n' which is not necessary. Cheers Ricardo -- http://mail.python.org/mailman/listinfo/python-list
Re: Formatting Results so that They Can be Nicely Imported into a Spreadsheet.
On Aug 5, 4:06?am, SMERSH009 <[EMAIL PROTECTED]> wrote: > On Aug 4, 8:25 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > > > > > > > On Aug 4, 9:21?pm, "Jim Langston" <[EMAIL PROTECTED]> wrote: > > > > <[EMAIL PROTECTED]> wrote in message > > > >news:[EMAIL PROTECTED] > > > > > On Aug 4, 6:35?pm, SMERSH009 <[EMAIL PROTECTED]> wrote: > > > >> Hi All. > > > >> Let's say I have some badly formatted text called doc: > > > > >> doc= > > > >> """ > > > >> friendid > > > >> Female > > > > >> 23 years old > > > > >> Los Gatos > > > > >> United States > > > >> friendid > > > >> Male > > > > >> 24 years old > > > > >> San Francisco, California > > > > >> United States > > > >> """ > > > > >> How would I get these results to be displayed in a format similar to: > > > >> friendid;Female;23 years old;Los Gatos;United States > > > >> friendid;Male; 24 years old;San Francisco, California;United States > > > > >> The latter is a lot easier to organize and can be quickly imported > > > >> into Excel's column format. > > > > >> Thanks Much, > > > >> Sam > > > > > d = doc.split('\n') > > > > > f = [i.split() for i in d if i] > > > > > g = [' '.join(i) for i in f] > > > > > rec = [] > > > > temprec = [] > > > > for i in g: > > > >if i: > > > >if i == 'friendid': > > > >rec.append(temprec) > > > >temprec = [i] > > > >else: > > > >temprec.append(i) > > > > rec.append(temprec) > > > > > output = [';'.join(i) for i in rec if i] > > > > > for i in output: print i > > > > > ##friendid;Female;23 years old;Los Gatos;United States > > > > ##friendid;Male;24 years old;San Francisco, California;United States > > > > also, I would suggest you use CSV format. > > > Well, the OP asked for a specific format. One is not > > always at liberty to change it. > > > > CSV stands for "Comma Seperated > > > Variable" and Excel can load such a sheet directly. > > > And Excel can load the shown format directly also, > > just specify the delimiter. > > > > Instead of seperating using ; seperate using , Of course, this provides a > > > problem when there is a , in a string. > > > Which explains the popularity of using tabs as delimiters. > > The data deliverable specification I use at work > > uses the pipe character | which never appears as data > > in this particular application. > > > > Resolution is to quote the string. > > > Which makes the file bigger and isn't necessary > > when tabs and pipes are used as delimiters. > > > > Being such, you can just go ahead and quote all strings. So you would > > > want > > > the output to be: > > > > "friendid","Female","23 years old","Los Gatos","United States" > > > "friendid","Male","24 years old","San Francisco, California","United > > > States" > > > Which I would do if I had a specification that > > demanded it or was making files for others. For my > > own use, I wouldn't bother as it's unnecessary work. > > > > Numbers should not be quoted if you wish to treat them as numeric and not > > > text. > > > A good reason not to use quotes at all. Besides which, > > Excel can handle that also. > > Thanks for all your posts guys. > mensanator's was the most helpful, and I only ended up needing to use > a few lines from that code. > The only question that remains for me--and this is just for my > knowledge-- what does the "if i" mean in this code snippet? > f = [i.split() for i in d if i] d is a list of strings, some of which are empty (caused by the previous split). "if i" evaluates False for an empty string, so that particular i will not get split and ends up ommitted from f. > How is it helpful to leave a dangling "if i"? Why not just f = > [i.split() for i in d]? Compare the result of having "if i": [[], ['friendid'], ['Female'], ['23', 'years', 'old'], ['Los', 'Gatos'], ['United', 'States'], ['friendid'], ['Male'], ['24', 'years', 'old'], ['San', 'Francisco,', 'California'], ['United', 'States']] to not having it: >>> ff = [i.split() for i in d] >>> ff [[], ['friendid'], ['Female'], [], [], ['23', 'years', 'old'], [], [], ['Los', 'Gatos'], [], [], ['United', 'States'], ['friendid'], ['Male'], [], [], ['24', 'years', 'old'], [], [], ['San', 'Francisco,', 'California'], [], [], ['United', 'States'], []] All the blank lines became empty lists which actually wouldn't change anything if they get joined. But that won't always be the case. Knowing how to use conditionals in list comprehensions is worth knowing. Even with the "if i" included, we end up with an empty list at the start. This because the first "blank" line wasn't blank, it was a space, so it passes the "if i" test. > > And yes John, this was indeed a "homework question." It was for my > daughter's preschool. You are going to help her ace her beginner > Python class! (No, this was not a homework question).
Re: Formatting Results so that They Can be Nicely Imported into a Spreadsheet.
On Aug 4, 8:25 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > On Aug 4, 9:21?pm, "Jim Langston" <[EMAIL PROTECTED]> wrote: > > > > > <[EMAIL PROTECTED]> wrote in message > > >news:[EMAIL PROTECTED] > > > > On Aug 4, 6:35?pm, SMERSH009 <[EMAIL PROTECTED]> wrote: > > >> Hi All. > > >> Let's say I have some badly formatted text called doc: > > > >> doc= > > >> """ > > >> friendid > > >> Female > > > >> 23 years old > > > >> Los Gatos > > > >> United States > > >> friendid > > >> Male > > > >> 24 years old > > > >> San Francisco, California > > > >> United States > > >> """ > > > >> How would I get these results to be displayed in a format similar to: > > >> friendid;Female;23 years old;Los Gatos;United States > > >> friendid;Male; 24 years old;San Francisco, California;United States > > > >> The latter is a lot easier to organize and can be quickly imported > > >> into Excel's column format. > > > >> Thanks Much, > > >> Sam > > > > d = doc.split('\n') > > > > f = [i.split() for i in d if i] > > > > g = [' '.join(i) for i in f] > > > > rec = [] > > > temprec = [] > > > for i in g: > > >if i: > > >if i == 'friendid': > > >rec.append(temprec) > > >temprec = [i] > > >else: > > >temprec.append(i) > > > rec.append(temprec) > > > > output = [';'.join(i) for i in rec if i] > > > > for i in output: print i > > > > ##friendid;Female;23 years old;Los Gatos;United States > > > ##friendid;Male;24 years old;San Francisco, California;United States > > > also, I would suggest you use CSV format. > > Well, the OP asked for a specific format. One is not > always at liberty to change it. > > > CSV stands for "Comma Seperated > > Variable" and Excel can load such a sheet directly. > > And Excel can load the shown format directly also, > just specify the delimiter. > > > > > Instead of seperating using ; seperate using , Of course, this provides a > > problem when there is a , in a string. > > Which explains the popularity of using tabs as delimiters. > The data deliverable specification I use at work > uses the pipe character | which never appears as data > in this particular application. > > > Resolution is to quote the string. > > Which makes the file bigger and isn't necessary > when tabs and pipes are used as delimiters. > > > Being such, you can just go ahead and quote all strings. So you would want > > the output to be: > > > "friendid","Female","23 years old","Los Gatos","United States" > > "friendid","Male","24 years old","San Francisco, California","United States" > > Which I would do if I had a specification that > demanded it or was making files for others. For my > own use, I wouldn't bother as it's unnecessary work. > > > > > Numbers should not be quoted if you wish to treat them as numeric and not > > text. > > A good reason not to use quotes at all. Besides which, > Excel can handle that also. Thanks for all your posts guys. mensanator's was the most helpful, and I only ended up needing to use a few lines from that code. The only question that remains for me--and this is just for my knowledge-- what does the "if i" mean in this code snippet? f = [i.split() for i in d if i] How is it helpful to leave a dangling "if i"? Why not just f = [i.split() for i in d]? And yes John, this was indeed a "homework question." It was for my daughter's preschool. You are going to help her ace her beginner Python class! (No, this was not a homework question). -- http://mail.python.org/mailman/listinfo/python-list
Re: Formatting Results so that They Can be Nicely Imported into a Spreadsheet.
On Aug 4, 9:21?pm, "Jim Langston" <[EMAIL PROTECTED]> wrote: > <[EMAIL PROTECTED]> wrote in message > > news:[EMAIL PROTECTED] > > > > > > > On Aug 4, 6:35?pm, SMERSH009 <[EMAIL PROTECTED]> wrote: > >> Hi All. > >> Let's say I have some badly formatted text called doc: > > >> doc= > >> """ > >> friendid > >> Female > > >> 23 years old > > >> Los Gatos > > >> United States > >> friendid > >> Male > > >> 24 years old > > >> San Francisco, California > > >> United States > >> """ > > >> How would I get these results to be displayed in a format similar to: > >> friendid;Female;23 years old;Los Gatos;United States > >> friendid;Male; 24 years old;San Francisco, California;United States > > >> The latter is a lot easier to organize and can be quickly imported > >> into Excel's column format. > > >> Thanks Much, > >> Sam > > > d = doc.split('\n') > > > f = [i.split() for i in d if i] > > > g = [' '.join(i) for i in f] > > > rec = [] > > temprec = [] > > for i in g: > >if i: > >if i == 'friendid': > >rec.append(temprec) > >temprec = [i] > >else: > >temprec.append(i) > > rec.append(temprec) > > > output = [';'.join(i) for i in rec if i] > > > for i in output: print i > > > ##friendid;Female;23 years old;Los Gatos;United States > > ##friendid;Male;24 years old;San Francisco, California;United States > > also, I would suggest you use CSV format. Well, the OP asked for a specific format. One is not always at liberty to change it. > CSV stands for "Comma Seperated > Variable" and Excel can load such a sheet directly. And Excel can load the shown format directly also, just specify the delimiter. > > Instead of seperating using ; seperate using , Of course, this provides a > problem when there is a , in a string. Which explains the popularity of using tabs as delimiters. The data deliverable specification I use at work uses the pipe character | which never appears as data in this particular application. > Resolution is to quote the string. Which makes the file bigger and isn't necessary when tabs and pipes are used as delimiters. > Being such, you can just go ahead and quote all strings. So you would want > the output to be: > > "friendid","Female","23 years old","Los Gatos","United States" > "friendid","Male","24 years old","San Francisco, California","United States" Which I would do if I had a specification that demanded it or was making files for others. For my own use, I wouldn't bother as it's unnecessary work. > > Numbers should not be quoted if you wish to treat them as numeric and not > text. A good reason not to use quotes at all. Besides which, Excel can handle that also. -- http://mail.python.org/mailman/listinfo/python-list
Re: Formatting Results so that They Can be Nicely Imported into a Spreadsheet.
<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On Aug 4, 6:35?pm, SMERSH009 <[EMAIL PROTECTED]> wrote: >> Hi All. >> Let's say I have some badly formatted text called doc: >> >> doc= >> """ >> friendid >> Female >> >> 23 years old >> >> Los Gatos >> >> United States >> friendid >> Male >> >> 24 years old >> >> San Francisco, California >> >> United States >> """ >> >> How would I get these results to be displayed in a format similar to: >> friendid;Female;23 years old;Los Gatos;United States >> friendid;Male; 24 years old;San Francisco, California;United States >> >> The latter is a lot easier to organize and can be quickly imported >> into Excel's column format. >> >> Thanks Much, >> Sam > > d = doc.split('\n') > > f = [i.split() for i in d if i] > > g = [' '.join(i) for i in f] > > rec = [] > temprec = [] > for i in g: >if i: >if i == 'friendid': >rec.append(temprec) >temprec = [i] >else: >temprec.append(i) > rec.append(temprec) > > output = [';'.join(i) for i in rec if i] > > for i in output: print i > > ##friendid;Female;23 years old;Los Gatos;United States > ##friendid;Male;24 years old;San Francisco, California;United States also, I would suggest you use CSV format. CSV stands for "Comma Seperated Variable" and Excel can load such a sheet directly. Instead of seperating using ; seperate using , Of course, this provides a problem when there is a , in a string. Resolution is to quote the string. Being such, you can just go ahead and quote all strings. So you would want the output to be: "friendid","Female","23 years old","Los Gatos","United States" "friendid","Male","24 years old","San Francisco, California","United States" Numbers should not be quoted if you wish to treat them as numeric and not text. -- http://mail.python.org/mailman/listinfo/python-list
Re: Formatting Results so that They Can be Nicely Imported into a Spreadsheet.
On Aug 4, 6:35?pm, SMERSH009 <[EMAIL PROTECTED]> wrote: > Hi All. > Let's say I have some badly formatted text called doc: > > doc= > """ > friendid > Female > > 23 years old > > Los Gatos > > United States > friendid > Male > > 24 years old > > San Francisco, California > > United States > """ > > How would I get these results to be displayed in a format similar to: > friendid;Female;23 years old;Los Gatos;United States > friendid;Male; 24 years old;San Francisco, California;United States > > The latter is a lot easier to organize and can be quickly imported > into Excel's column format. > > Thanks Much, > Sam d = doc.split('\n') f = [i.split() for i in d if i] g = [' '.join(i) for i in f] rec = [] temprec = [] for i in g: if i: if i == 'friendid': rec.append(temprec) temprec = [i] else: temprec.append(i) rec.append(temprec) output = [';'.join(i) for i in rec if i] for i in output: print i ##friendid;Female;23 years old;Los Gatos;United States ##friendid;Male;24 years old;San Francisco, California;United States -- http://mail.python.org/mailman/listinfo/python-list
Re: Formatting Results so that They Can be Nicely Imported into a Spreadsheet.
On Aug 5, 9:35 am, SMERSH009 <[EMAIL PROTECTED]> wrote: > Hi All. > Let's say I have some badly formatted text called doc: > > doc= > """ > friendid > Female > > 23 years old > > Los Gatos > > United States > friendid > Male > > 24 years old > > San Francisco, California > > United States > """ > > How would I get these results to be displayed in a format similar to: > friendid;Female;23 years old;Los Gatos;United States > friendid;Male; 24 years old;San Francisco, California;United States > > The latter is a lot easier to organize and can be quickly imported > into Excel's column format. > You write a script to read your input file and write it out either in CSV format, using the Python csv module, or just use ';'.join(output_row) if you are sure that there are no ';' characters in the data. Is this a homework question? -- http://mail.python.org/mailman/listinfo/python-list
Formatting Results so that They Can be Nicely Imported into a Spreadsheet.
Hi All. Let's say I have some badly formatted text called doc: doc= """ friendid Female 23 years old Los Gatos United States friendid Male 24 years old San Francisco, California United States """ How would I get these results to be displayed in a format similar to: friendid;Female;23 years old;Los Gatos;United States friendid;Male; 24 years old;San Francisco, California;United States The latter is a lot easier to organize and can be quickly imported into Excel's column format. Thanks Much, Sam -- http://mail.python.org/mailman/listinfo/python-list