Re: Splitting on '^' ?
And .splitlines seems to be able to handle all "standard" end-of-line > markers without any special direction (which, ironically, strikes > me as a *little* Perlish, somehow): > > >>> "spam\015\012ham\015eggs\012".splitlines(True) > ['spam\r\n', 'ham\r', 'eggs\n'] > ... actually "working correctly" and robustly is "perlish"? :) The only reason I've ever actually used this method is this very feature of it, that you can't readily reproduce with other methods unless you start getting into regular expressions (and I really believe regular expressions should not be the default place one looks to solve a problem in Python) Then again, as soon as Python started allowing you to open files with mode "rU", I gleefully ran through my codebase and changed every operation to that and made sure to write out with platform-local newlines exclusively, thus finally flipping off those darn files that users kept producing with mixed line endings. > Amazing. I'm not sure this is the *best* way to do this in general > (I would have preferred it, and IMHO it would have been more > Pythonic, if .splitlines accepted an additional optional argument > where one could specify the end-of-line sequence to be used for > the splitting, defaulting to the OS's conventional sequence, and > then it split *strictly* on that sequence). > If you want strict and absolute splitting, you don't need another method; just do mystring.split(os.linesep); I mean sure, it doesn't have the 'keepends' feature -- but I don't actually understand why you want keepends with a strict definition of endings... If you /only/ want to split on \n, you know there's an \n on the end of each line in the returned list and can easily be sure to write it out (for example) :) In the modern world of mixed systems and the internet, and files being flung around willy-nilly, and editors being configured to varying degrees of correctness, and such It's Pythonic to be able to handle all these files that anyone made on any system and treat them as they are clearly *meant* to be treated. Since the intention *is* clear that these are all *end of line* markers-- it's explicitly stated, just slightly differently depending on the OS-- Python treats all of the line-endings as equal on read if you want it to. By using either str.splitlines() or opening a text file as "rU". Thank goodness for that :) In some cases you may need a more pedantic approach to line endings. In that case, just use str.split() :) --S -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
On Aug 16, 1:09 pm, kj wrote: > And .splitlines seems to be able to handle all > "standard" end-of-line markers without any special > direction (which, ironically, strikes > me as a *little* Perlish, somehow): It's Pythonic. Universal newline-handling for text has been a staple of Python for as long as I can remember (very possibly since the very beginning). > >>> "spam\015\012ham\015eggs\012".splitlines(True) > > ['spam\r\n', 'ham\r', 'eggs\n'] > > Amazing. I'm not sure this is the *best* way to do > this in general (I would have preferred it, and IMHO > it would have been more Pythonic, if .splitlines > accepted an additional optional argument [...]). I believe it's the best way. When you can use a string method instead of a regex, it's definitely most Pythonic to use the string method. I would argue that this particular string method is Pythonic in design. Remember, Python strives not only for explicitness, but simplicity and ease of use. When dealing with text, universal newlines are much more often than not simpler and easier for the programmer. John -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
In ru...@yahoo.com writes: >On Aug 14, 2:23=A0pm, kj wrote: >> Sometimes I want to split a string into lines, preserving the >> end-of-line markers. =A0In Perl this is really easy to do, by splitting >> on the beginning-of-line anchor: >> >> =A0 @lines =3D split /^/, $string; >> >> But I can't figure out how to do the same thing with Python. =A0E.g.: >Why not this? >>>> lines =3D 'spam\nham\neggs\n'.splitlines (True) >>>> lines >['spam\n', 'ham\n', 'eggs\n'] That's perfect. And .splitlines seems to be able to handle all "standard" end-of-line markers without any special direction (which, ironically, strikes me as a *little* Perlish, somehow): >>> "spam\015\012ham\015eggs\012".splitlines(True) ['spam\r\n', 'ham\r', 'eggs\n'] Amazing. I'm not sure this is the *best* way to do this in general (I would have preferred it, and IMHO it would have been more Pythonic, if .splitlines accepted an additional optional argument where one could specify the end-of-line sequence to be used for the splitting, defaulting to the OS's conventional sequence, and then it split *strictly* on that sequence). But for now this .splitlines will do nicely. Thanks! kynn -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
MRAB wrote: Ethan Furman wrote: kj wrote: Sometimes I want to split a string into lines, preserving the end-of-line markers. In Perl this is really easy to do, by splitting on the beginning-of-line anchor: @lines = split /^/, $string; But I can't figure out how to do the same thing with Python. E.g.: import re re.split('^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] re.split('(?m)^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] bol_re = re.compile('^', re.M) bol_re.split('spam\nham\neggs\n') ['spam\nham\neggs\n'] Am I doing something wrong? kynn As you probably noticed from the other responses: No, you can't split on _and_ keep the splitby text. You _can_ split and keep what you split on: >>> re.split("(x)", "abxcd") ['ab', 'x', 'cd'] You _can't_ split on a zero-width match: >>> re.split("(x*)", "abxcd") ['ab', 'x', 'cd'] but you can use re.sub to replace zero-width matches with something that's not zero-width and then split on that (best with str.split): >>> re.sub("(x*)", "@", "abxcd") '@a...@b@c...@d@' >>> re.sub("(x*)", "@", "abxcd").split("@") ['', 'a', 'b', 'c', 'd', ''] Wow! I stand corrected, although I'm in danger of falling over from the dizziness! :) As impressive as that is, I don't think it does what the OP is looking for. rurpy reminded us (or at least me ;) of .splitlines(), which seems to do exactly what the OP is looking for. I do take some comfort that my little snippet works for more than newlines alone, although I'm not aware of any other use-cases. :( ~Ethan~ Oh, hey, how about this? re.compile('(^[^\n]*\n?)', re.M).findall('text\ntext\ntext) Although this does give me an extra blank segment at the end... oh well. -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
Ethan Furman wrote: kj wrote: Sometimes I want to split a string into lines, preserving the end-of-line markers. In Perl this is really easy to do, by splitting on the beginning-of-line anchor: @lines = split /^/, $string; But I can't figure out how to do the same thing with Python. E.g.: import re re.split('^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] re.split('(?m)^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] bol_re = re.compile('^', re.M) bol_re.split('spam\nham\neggs\n') ['spam\nham\neggs\n'] Am I doing something wrong? kynn As you probably noticed from the other responses: No, you can't split on _and_ keep the splitby text. You _can_ split and keep what you split on: >>> re.split("(x)", "abxcd") ['ab', 'x', 'cd'] You _can't_ split on a zero-width match: >>> re.split("(x*)", "abxcd") ['ab', 'x', 'cd'] but you can use re.sub to replace zero-width matches with something that's not zero-width and then split on that (best with str.split): >>> re.sub("(x*)", "@", "abxcd") '@a...@b@c...@d@' >>> re.sub("(x*)", "@", "abxcd").split("@") ['', 'a', 'b', 'c', 'd', ''] -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
On Aug 14, 2:23 pm, kj wrote: > Sometimes I want to split a string into lines, preserving the > end-of-line markers. In Perl this is really easy to do, by splitting > on the beginning-of-line anchor: > > @lines = split /^/, $string; > > But I can't figure out how to do the same thing with Python. E.g.: Why not this? >>> lines = 'spam\nham\neggs\n'.splitlines (True) >>> lines ['spam\n', 'ham\n', 'eggs\n'] -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
> kj (k) wrote: >k> Sometimes I want to split a string into lines, preserving the >k> end-of-line markers. In Perl this is really easy to do, by splitting >k> on the beginning-of-line anchor: >k> @lines = split /^/, $string; >k> But I can't figure out how to do the same thing with Python. E.g.: > import re > re.split('^', 'spam\nham\neggs\n') >k> ['spam\nham\neggs\n'] > re.split('(?m)^', 'spam\nham\neggs\n') >k> ['spam\nham\neggs\n'] > bol_re = re.compile('^', re.M) > bol_re.split('spam\nham\neggs\n') >k> ['spam\nham\neggs\n'] >k> Am I doing something wrong? It says that in the doc of 're': Note that split will never split a string on an empty pattern match. For example: >>> re.split('x*', 'foo') ['foo'] >>> re.split("(?m)^$", "foo\n\nbar\n") ['foo\n\nbar\n'] -- Piet van Oostrum URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: p...@vanoostrum.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
Gary Herron wrote: kj wrote: Sometimes I want to split a string into lines, preserving the end-of-line markers. In Perl this is really easy to do, by splitting on the beginning-of-line anchor: @lines = split /^/, $string; But I can't figure out how to do the same thing with Python. E.g.: import re re.split('^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] re.split('(?m)^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] bol_re = re.compile('^', re.M) bol_re.split('spam\nham\neggs\n') ['spam\nham\neggs\n'] Am I doing something wrong? Just split on the EOL character: the "\n": re.split('\n', 'spam\nham\neggs\n') ['spam', 'ham', 'eggs', ''] The "^" and "$" characters do not match END-OF-LINE, but rather the END-OF-STRING, which was doing you no good. With the MULTLINE flag "^" matches START-OF-LINE and "$" matches END-OF-LINE or END-OF-STRING. The current re module won't split on a zero-width match. -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
kj wrote: Sometimes I want to split a string into lines, preserving the end-of-line markers. In Perl this is really easy to do, by splitting on the beginning-of-line anchor: @lines = split /^/, $string; But I can't figure out how to do the same thing with Python. E.g.: import re re.split('^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] re.split('(?m)^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] bol_re = re.compile('^', re.M) bol_re.split('spam\nham\neggs\n') ['spam\nham\neggs\n'] Am I doing something wrong? kynn As you probably noticed from the other responses: No, you can't split on _and_ keep the splitby text. Looks like you'll have to roll your own. def splitat(text, sep): result = [line + sep for line in text.split(sep)] if result[-1] == sep: # either remove extra element result.pop() else: # or extra sep from last element result[-1] = result[-1][:-len(sep)] return result -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
On Fri, Aug 14, 2009 at 2:23 PM, kj wrote: > > > Sometimes I want to split a string into lines, preserving the > end-of-line markers. In Perl this is really easy to do, by splitting > on the beginning-of-line anchor: > > @lines = split /^/, $string; > > But I can't figure out how to do the same thing with Python. E.g.: > > >>> import re > >>> re.split('^', 'spam\nham\neggs\n') > ['spam\nham\neggs\n'] > >>> re.split('(?m)^', 'spam\nham\neggs\n') > ['spam\nham\neggs\n'] > >>> bol_re = re.compile('^', re.M) > >>> bol_re.split('spam\nham\neggs\n') > ['spam\nham\neggs\n'] > > Am I doing something wrong? > > kynn > -- > http://mail.python.org/mailman/listinfo/python-list > You shouldn't use a regular expression for that. >>> from time import time >>> start=time();'spam\nham\neggs\n'.split('\n');print time()-start; ['spam', 'ham', 'eggs', ''] 4.6968460083e-05 >>> import re >>> start=time();re.split(r'\n', 'spam\nham\neggs');print time()-start; ['spam', 'ham', 'eggs'] 0.000284910202026 -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
On Fri, Aug 14, 2009 at 5:23 PM, kj wrote: > import re re.split('^', 'spam\nham\neggs\n') > ['spam\nham\neggs\n'] re.split('(?m)^', 'spam\nham\neggs\n') > ['spam\nham\neggs\n'] bol_re = re.compile('^', re.M) bol_re.split('spam\nham\neggs\n') > ['spam\nham\neggs\n'] > > Am I doing something wrong? > Maybe this: >>> import re >>> te = 'spam\nham\neggs\n' >>> pat = '\n' >>> re.split(pat,te) ['spam', 'ham', 'eggs', ''] -- Kind Regards -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
On Fri, Aug 14, 2009 at 3:23 PM, kj wrote: > [snip] import re re.split('^', 'spam\nham\neggs\n') > ['spam\nham\neggs\n'] re.split('(?m)^', 'spam\nham\neggs\n') > ['spam\nham\neggs\n'] bol_re = re.compile('^', re.M) bol_re.split('spam\nham\neggs\n') > ['spam\nham\neggs\n'] > > Am I doing something wrong? Why not just: >>> re.split(r'\n', 'spam\nham\neggs') \t -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on '^' ?
kj wrote: Sometimes I want to split a string into lines, preserving the end-of-line markers. In Perl this is really easy to do, by splitting on the beginning-of-line anchor: @lines = split /^/, $string; But I can't figure out how to do the same thing with Python. E.g.: import re re.split('^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] re.split('(?m)^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] bol_re = re.compile('^', re.M) bol_re.split('spam\nham\neggs\n') ['spam\nham\neggs\n'] Am I doing something wrong? Just split on the EOL character: the "\n": re.split('\n', 'spam\nham\neggs\n') ['spam', 'ham', 'eggs', ''] The "^" and "$" characters do not match END-OF-LINE, but rather the END-OF-STRING, which was doing you no good. Gary Herron kynn -- http://mail.python.org/mailman/listinfo/python-list
Splitting on '^' ?
Sometimes I want to split a string into lines, preserving the end-of-line markers. In Perl this is really easy to do, by splitting on the beginning-of-line anchor: @lines = split /^/, $string; But I can't figure out how to do the same thing with Python. E.g.: >>> import re >>> re.split('^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] >>> re.split('(?m)^', 'spam\nham\neggs\n') ['spam\nham\neggs\n'] >>> bol_re = re.compile('^', re.M) >>> bol_re.split('spam\nham\neggs\n') ['spam\nham\neggs\n'] Am I doing something wrong? kynn -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading files, splitting on a delimiter and newlines.
[EMAIL PROTECTED] a écrit : > On Jul 25, 8:46 am, [EMAIL PROTECTED] wrote: > >>Hello, >> >>I have a situation where I have a file that contains text similar to: >> >>myValue1 = contents of value1 >>myValue2 = contents of value2 but >>with a new line here >>myValue3 = contents of value3 >> >>My first approach was to open the file, use readlines to split the >>lines on the "=" delimiter into a key/value pair (to be stored in a >>dict). >> >>After processing a couple files I noticed its possible that a newline >>can be present in the value as shown in myValue2. >> >>In this case its not an option to say remove the newlines if its a >>"multi line" value as the value data needs to stay intact. >> >>I'm a bit confused as how to go about getting this to work. >> >>Any suggestions on an approach would be greatly appreciated! > > > > > Check the length of the list returned from split; this allows > your to append to the previously extracted value if need be. > > import StringIO > import pprint > > buf = """\ > myValue1 = contents of value1 > myValue2 = contents of value2 but >with a new line here > myValue3 = contents of value3 > """ > > mockfile = StringIO.StringIO(buf) > > record=dict() > > for line in mockfile: > kvpair = line.split('=', 2) You want : kvpair = line.split('=', 1) >>> toto = "x = 42 = 33" >>> toto.split('=', 2) ['x ', ' 42 ', ' 33'] > if len(kvpair) == 2: > key, value = kvpair > record[key] = value > else: > record[key] += line Also, this won't handle the case where the first line doesn't contain an '=' (NameError, name 'key' is not defined) -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading files, splitting on a delimiter and newlines.
[EMAIL PROTECTED] a écrit : > Hello, > > I have a situation where I have a file that contains text similar to: > > myValue1 = contents of value1 > myValue2 = contents of value2 but > with a new line here > myValue3 = contents of value3 > > My first approach was to open the file, use readlines to split the > lines on the "=" delimiter into a key/value pair (to be stored in a > dict). > > After processing a couple files I noticed its possible that a newline > can be present in the value as shown in myValue2. > > In this case its not an option to say remove the newlines if its a > "multi line" value as the value data needs to stay intact. > > I'm a bit confused as how to go about getting this to work. > > Any suggestions on an approach would be greatly appreciated! > data = {} key = None for line in open('yourfile.txt'): line = line.strip() if not line: # skip empty lines continue if '=' in line: key, value = map(str.strip, line.split('=', 1)) data[key] = value elif key is None: # first line without a '=' raise ValueError("invalid format") else: # multiline data[key] += "\n" + line print data => {'myValue3': 'contents of value3', 'myValue2': 'contents of value2 but\nwith a new line here', 'myValue1': 'contents of value1'} HTH -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading files, splitting on a delimiter and newlines.
: <[EMAIL PROTECTED]> Wrote: > On Jul 25, 10:46 am, [EMAIL PROTECTED] wrote: > > Hello, > > > > I have a situation where I have a file that contains text similar to: > > > > myValue1 = contents of value1 > > myValue2 = contents of value2 but > > with a new line here > > myValue3 = contents of value3 > > > > My first approach was to open the file, use readlines to split the > > lines on the "=" delimiter into a key/value pair (to be stored in a > > dict). > > > > After processing a couple files I noticed its possible that a newline > > can be present in the value as shown in myValue2. > > > > In this case its not an option to say remove the newlines if its a > > "multi line" value as the value data needs to stay intact. > > > > I'm a bit confused as how to go about getting this to work. > > > > Any suggestions on an approach would be greatly appreciated! > > I'm confused. You don't want the newline to be present, but you can't > remove it because the data has to stay intact? If you don't want to > change it, then what's the problem? I think the OP's trouble is that the value he wants gets split up by the newline at the end of the line when he uses readline(). One can try adding the single value to the previous value in the previous key/value pair when the split does not yield two values - a bit hackish, but given structured input data it might work. - Hendrik -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading files, splitting on a delimiter and newlines.
On Jul 25, 7:56 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > On Jul 25, 8:46 am, [EMAIL PROTECTED] wrote: > > > > > Hello, > > > I have a situation where I have a file that contains text similar to: > > > myValue1 = contents of value1 > > myValue2 = contents of value2 but > > with a new line here > > myValue3 = contents of value3 > > > My first approach was to open the file, use readlines to split the > > lines on the "=" delimiter into a key/value pair (to be stored in a > > dict). > > > After processing a couple files I noticed its possible that a newline > > can be present in the value as shown in myValue2. > > > In this case its not an option to say remove the newlines if its a > > "multi line" value as the value data needs to stay intact. > > > I'm a bit confused as how to go about getting this to work. > > > Any suggestions on an approach would be greatly appreciated! > > Check the length of the list returned from split; this allows > your to append to the previously extracted value if need be. > > import StringIO > import pprint > > buf = """\ > myValue1 = contents of value1 > myValue2 = contents of value2 but >with a new line here > myValue3 = contents of value3 > """ > > mockfile = StringIO.StringIO(buf) > > record=dict() > > for line in mockfile: > kvpair = line.split('=', 2) > if len(kvpair) == 2: > key, value = kvpair > record[key] = value > else: > record[key] += line > > pprint.pprint(record) > > # lstrip() to remove newlines if needed ... > > -- > Hope this helps, > Steven Great thank you! That was the logic I was looking for. -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading files, splitting on a delimiter and newlines.
On Jul 25, 8:46 am, [EMAIL PROTECTED] wrote: > Hello, > > I have a situation where I have a file that contains text similar to: > > myValue1 = contents of value1 > myValue2 = contents of value2 but > with a new line here > myValue3 = contents of value3 > > My first approach was to open the file, use readlines to split the > lines on the "=" delimiter into a key/value pair (to be stored in a > dict). > > After processing a couple files I noticed its possible that a newline > can be present in the value as shown in myValue2. > > In this case its not an option to say remove the newlines if its a > "multi line" value as the value data needs to stay intact. > > I'm a bit confused as how to go about getting this to work. > > Any suggestions on an approach would be greatly appreciated! Check the length of the list returned from split; this allows your to append to the previously extracted value if need be. import StringIO import pprint buf = """\ myValue1 = contents of value1 myValue2 = contents of value2 but with a new line here myValue3 = contents of value3 """ mockfile = StringIO.StringIO(buf) record=dict() for line in mockfile: kvpair = line.split('=', 2) if len(kvpair) == 2: key, value = kvpair record[key] = value else: record[key] += line pprint.pprint(record) # lstrip() to remove newlines if needed ... -- Hope this helps, Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading files, splitting on a delimiter and newlines.
On Jul 26, 3:08 am, Stargaming <[EMAIL PROTECTED]> wrote: > On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote: > > On Jul 25, 10:46 am, [EMAIL PROTECTED] wrote: > >> Hello, > > >> I have a situation where I have a file that contains text similar to: > > >> myValue1 = contents of value1 > >> myValue2 = contents of value2 but > >> with a new line here > >> myValue3 = contents of value3 > > >> My first approach was to open the file, use readlines to split the > >> lines on the "=" delimiter into a key/value pair (to be stored in a > >> dict). > > >> After processing a couple files I noticed its possible that a newline > >> can be present in the value as shown in myValue2. > > >> In this case its not an option to say remove the newlines if its a > >> "multi line" value as the value data needs to stay intact. > > >> I'm a bit confused as how to go about getting this to work. > > >> Any suggestions on an approach would be greatly appreciated! > > > I'm confused. You don't want the newline to be present, but you can't > > remove it because the data has to stay intact? If you don't want to > > change it, then what's the problem? > > > Mike > > It's obviously that simple line-by-line filtering won't handle multi-line > statements. > > You could solve that by saving the last item you added something to and, > if the line currently handles doesn't look like an assignment, append it > to this item. You might run into problems with such data: > > foo = modern maths > proved that 1 = 1 > bar = single > > If your dataset always has indendation on subsequent lines, you might use > this. Or if the key's name is always just one word. > My take: all of the above, plus: Given that you want to extract stuff of the form = I'd suggest developing a fairly precise regular expression for LHS, maybe even for RHS, and trying this on as many of these files as you can. Why an RE for RHS? Consider: foo = somebody said "I think that REs = trouble maybe_better = pyparsing" :-) -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading files, splitting on a delimiter and newlines.
On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote: > On Jul 25, 10:46 am, [EMAIL PROTECTED] wrote: >> Hello, >> >> I have a situation where I have a file that contains text similar to: >> >> myValue1 = contents of value1 >> myValue2 = contents of value2 but >> with a new line here >> myValue3 = contents of value3 >> >> My first approach was to open the file, use readlines to split the >> lines on the "=" delimiter into a key/value pair (to be stored in a >> dict). >> >> After processing a couple files I noticed its possible that a newline >> can be present in the value as shown in myValue2. >> >> In this case its not an option to say remove the newlines if its a >> "multi line" value as the value data needs to stay intact. >> >> I'm a bit confused as how to go about getting this to work. >> >> Any suggestions on an approach would be greatly appreciated! > > I'm confused. You don't want the newline to be present, but you can't > remove it because the data has to stay intact? If you don't want to > change it, then what's the problem? > > Mike It's obviously that simple line-by-line filtering won't handle multi-line statements. You could solve that by saving the last item you added something to and, if the line currently handles doesn't look like an assignment, append it to this item. You might run into problems with such data: foo = modern maths proved that 1 = 1 bar = single If your dataset always has indendation on subsequent lines, you might use this. Or if the key's name is always just one word. HTH, Stargaming -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading files, splitting on a delimiter and newlines.
On Jul 25, 10:46 am, [EMAIL PROTECTED] wrote: > Hello, > > I have a situation where I have a file that contains text similar to: > > myValue1 = contents of value1 > myValue2 = contents of value2 but > with a new line here > myValue3 = contents of value3 > > My first approach was to open the file, use readlines to split the > lines on the "=" delimiter into a key/value pair (to be stored in a > dict). > > After processing a couple files I noticed its possible that a newline > can be present in the value as shown in myValue2. > > In this case its not an option to say remove the newlines if its a > "multi line" value as the value data needs to stay intact. > > I'm a bit confused as how to go about getting this to work. > > Any suggestions on an approach would be greatly appreciated! I'm confused. You don't want the newline to be present, but you can't remove it because the data has to stay intact? If you don't want to change it, then what's the problem? Mike -- http://mail.python.org/mailman/listinfo/python-list
Reading files, splitting on a delimiter and newlines.
Hello, I have a situation where I have a file that contains text similar to: myValue1 = contents of value1 myValue2 = contents of value2 but with a new line here myValue3 = contents of value3 My first approach was to open the file, use readlines to split the lines on the "=" delimiter into a key/value pair (to be stored in a dict). After processing a couple files I noticed its possible that a newline can be present in the value as shown in myValue2. In this case its not an option to say remove the newlines if its a "multi line" value as the value data needs to stay intact. I'm a bit confused as how to go about getting this to work. Any suggestions on an approach would be greatly appreciated! -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on a word
Hi Bernhard, firstly you must excuse my English ("angry" is a little ...strong, but my vocabulary is limited). I hope that the experts keep on helping us newbie. Also if I am a newbie (in Python), I disagree with you: my solution (with the help of Joe) answers to the problem of splitting a string using a delimiter of more than one character (sometimes a word as delimiter, but it is not required). The code I supplied can be misleading because is centered in web parsing, but my request is more general (Next time I will only make the question without examples!) If I were a professional programmer I could agree with you and the "Batteries included" concept and all the other considerations ("off-the-shelf solutions" and ...not reinventing the wheel). Also the terrific example you supply in order to caution me not to follow dully (found in the dictionary) the "simple & short" concept, doesn't apply to me (too complicated!). I am so far from a real programmer that when an error occurs, I use try/except (if they solve the problem) without caring of the sources of the mistake, ...EAFP!). So I don't care too much of possible future mistakes (also if the code takes into account capital letters). For the specific case I mentioned, actually if the closing tag ">" is missing perhaps I obtain wrong results... I will worry when necessary (also if the Murphy law...). Bye. -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on a word
[EMAIL PROTECTED] wrote: > Bernard... don't get angry, but I prefer the solution of Joe. Oh. If I got angry in such a case, I would have stopped responding to such posts long ago You know the background... and you'll have to bear the consequences. ;-) > ... > for me "pythonic" means simple and short (I may be wrong...). It's your definition, isn't it? One of the most important advantages of Python (for me!) besides its readability is that it comes with "Batteries included", which means, that I can benefit of the work others did before, and that I can rely on its quality. The solution which I proposed is nothing but the test code from htmllib, stripped down to the absolut minimum, enriched with the print command to show the anchor list. If I had to write production-level code of your sort, I'd take such an off-the-shelf solution, because it minimizes the risk of failures. Think only of such issues like these: - does your code find a tag like or references with/without " ...? - does it survive ill-coded html after all? I've made the experience that it's usually better to rely on such "library" code than to reinvent the wheel. There's often a reason to take another approach. I'd agree that a simple and short solution is fascinating. However, every simple and short solution should be readable. As a terrific example, here's a very tiny piece of code, which does nothing but calculate the prime numbers up to 1000: print filter(None,map(lambda y:y*reduce(lambda x,y:x*y!=0, map(lambda x,y=y:y%x,range(2,int(pow(y,0.5)+1))),1), range(2,1000))) - simple (depends on your familiarity with stuff like map and lambda) - short (compared with different solutions) - and veeeyyy pythonic! Bernhard -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on a word
Hi all, thanks for your contributions. To Robert Kern I can replay that I know BeautifulSoap, but mine wanted to be a "generalization" (only incidentally used in a web parsing application). The fact is that, beeing a "macho newbie" programmer (the "macho" is from Steven D'Aprano), I wanted to show how beaufiful solutions I can find... Luckily there is Joe who shows me that he most of my "beautiful" code (working, of course!) can be replaced by: list=p.split(s) Bernard... don't get angry, but I prefer the solution of Joe. It is more general, and, besides that, for me "pythonic" means simple and short (I may be wrong...). By the way, I have found an alternative solution to the problem of lists "unique", without sorting, but non beeing enough "macho"... Bye. -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on a word
[EMAIL PROTECTED] wrote: > Hi all, > I am writing a script to visualize (and print) > the web references hidden in the html files as: > ' underlined reference' > Optimizing my code, I found that an essential step is: > splitting on a word (in this case 'href'). > > I am asking if there is some alternative (more pythonic...): Sure. The htmllib module provides HTMLparser. Here's an example, run it with your HTML file as argument and you'll see a list of all href's in the document. # #!/usr/bin/python import htmllib def test(): import sys, formatter file = sys.argv[1] f = open(file, 'r') data = f.read() f.close() f = formatter.NullFormatter() p = htmllib.HTMLParser(f) p.feed(data) for a_link in p.anchorlist: print a_link p.close() test() # I'm sure that this is far more Pythonic! Bernhard -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on a word
# string s simulating an html file s='ffy: ytrty python fyt wx dtrtf' p=re.compile(r'\bhref\b',re.I) list=p.split(s) #< gets you your final list. good luck, Joe -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on a word
On Wed, 13 Jul 2005 06:19:54 -0700, qwweeeit wrote: > Hi all, > I am writing a script to visualize (and print) > the web references hidden in the html files as: > ' underlined reference' > Optimizing my code, [red rag to bull] Because it was too slow? Or just to prove what a macho programmer you are? Is your code even working yet? If it isn't working, you shouldn't be trying to optimizing buggy code. > I found that an essential step is: > splitting on a word (in this case 'href'). Then just do it: py> ' underlined reference'.split('href') [' underlined reference'] If you are concerned about case issues, you can either convert the entire HTML file to lowercase, or you might write a case-insensitive regular expression to replace any "href" regardless of case with the lowercase version. [snip] > To be sure as delimiter I choose chr(127) > which surely is not present in the html file. I wouldn't bet my life on that. I've found some weird characters in HTML files. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: Splitting on a word
[EMAIL PROTECTED] wrote: > Hi all, > I am writing a script to visualize (and print) > the web references hidden in the html files as: > ' underlined reference' > Optimizing my code, I found that an essential step is: > splitting on a word (in this case 'href'). > > I am asking if there is some alternative (more pythonic...): For *this* particular task, certainly. It begins with import BeautifulSoup The rest is left as a (brief) exercise for the reader. :-) As for the more general task of splitting strings using regular expressions, see re.split(). -- Robert Kern [EMAIL PROTECTED] "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter -- http://mail.python.org/mailman/listinfo/python-list
Splitting on a word
Hi all, I am writing a script to visualize (and print) the web references hidden in the html files as: ' underlined reference' Optimizing my code, I found that an essential step is: splitting on a word (in this case 'href'). I am asking if there is some alternative (more pythonic...): # SplitMultichar.py import re # string s simulating an html file s='ffy: ytrty python fyt wx dtrtf' p=re.compile(r'\bhref\b',re.I) lHref=p.findall(s) # lHref=['href','HREF'] # for normal html files the lHref list has more elements # (more web references) c='~' # char to be used as delimiter # c=chr(127) # char to be used as delimiter for i in lHref: s=s.replace(i,c) # s ='ffy: ytrty python fyt wx dtrtf' list=s.split(c) # list=['ffy: ytrty python fyt wx dtrtf'] #=- If you save the original s string to xxx.html, any browser can visualize it. To be sure as delimiter I choose chr(127) which surely is not present in the html file. Bye. -- http://mail.python.org/mailman/listinfo/python-list