Re: Splitting on '^' ?

2009-08-16 Thread Stephen Hansen
And .splitlines seems to be able to handle all "standard" end-of-line
> markers without any special direction (which, ironically, strikes
> me as a *little* Perlish, somehow):
>
> >>> "spam\015\012ham\015eggs\012".splitlines(True)
> ['spam\r\n', 'ham\r', 'eggs\n']
>

... actually "working correctly" and robustly is "perlish"? :)

The only reason I've ever actually used this method is this very feature of
it, that you can't readily reproduce with other methods unless you start
getting into regular expressions (and I really believe regular expressions
should not be the default place one looks to solve a problem in Python)

Then again, as soon as Python started allowing you to open files with mode
"rU", I gleefully ran through my codebase and changed every operation to
that and made sure to write out with platform-local newlines exclusively,
thus finally flipping off those darn files that users kept producing with
mixed line endings.


> Amazing.  I'm not sure this is the *best* way to do this in general
> (I would have preferred it, and IMHO it would have been more
> Pythonic, if .splitlines accepted an additional optional argument
> where one could specify the end-of-line sequence to be used for
> the splitting, defaulting to the OS's conventional sequence, and
> then it split *strictly* on that sequence).
>

If you want strict and absolute splitting, you don't need another method;
just do mystring.split(os.linesep); I mean sure, it doesn't have the
'keepends' feature -- but I don't actually understand why you want keepends
with a strict definition of endings... If you /only/ want to split on \n,
you know there's an \n on the end of each line in the returned list and can
easily be sure to write it out (for example) :)

In the modern world of mixed systems and the internet, and files being flung
around willy-nilly, and editors being configured to varying degrees of
correctness, and such It's Pythonic to be able to handle all these files
that anyone made on any system and treat them as they are clearly *meant* to
be treated. Since the intention *is* clear that these are all *end of line*
markers-- it's explicitly stated, just slightly differently depending on the
OS-- Python treats all of the line-endings as equal on read if you want it
to. By using either str.splitlines() or opening a text file as "rU". Thank
goodness for that :)

In some cases you may need a more pedantic approach to line endings. In that
case, just use str.split() :)

--S
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-16 Thread John Yeung
On Aug 16, 1:09 pm, kj  wrote:
> And .splitlines seems to be able to handle all
> "standard" end-of-line markers without any special
> direction (which, ironically, strikes
> me as a *little* Perlish, somehow):

It's Pythonic.  Universal newline-handling for text has been a staple
of Python for as long as I can remember (very possibly since the very
beginning).

> >>> "spam\015\012ham\015eggs\012".splitlines(True)
>
> ['spam\r\n', 'ham\r', 'eggs\n']
>
> Amazing.  I'm not sure this is the *best* way to do
> this in general (I would have preferred it, and IMHO
> it would have been more Pythonic, if .splitlines
> accepted an additional optional argument [...]).

I believe it's the best way.  When you can use a string method instead
of a regex, it's definitely most Pythonic to use the string method.

I would argue that this particular string method is Pythonic in
design.  Remember, Python strives not only for explicitness, but
simplicity and ease of use.  When dealing with text, universal
newlines are much more often than not simpler and easier for the
programmer.

John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-16 Thread kj
In  
ru...@yahoo.com writes:

>On Aug 14, 2:23=A0pm, kj  wrote:
>> Sometimes I want to split a string into lines, preserving the
>> end-of-line markers. =A0In Perl this is really easy to do, by splitting
>> on the beginning-of-line anchor:
>>
>> =A0 @lines =3D split /^/, $string;
>>
>> But I can't figure out how to do the same thing with Python. =A0E.g.:

>Why not this?

>>>> lines =3D 'spam\nham\neggs\n'.splitlines (True)
>>>> lines
>['spam\n', 'ham\n', 'eggs\n']

That's perfect.

And .splitlines seems to be able to handle all "standard" end-of-line
markers without any special direction (which, ironically, strikes
me as a *little* Perlish, somehow):

>>> "spam\015\012ham\015eggs\012".splitlines(True)
['spam\r\n', 'ham\r', 'eggs\n']

Amazing.  I'm not sure this is the *best* way to do this in general
(I would have preferred it, and IMHO it would have been more
Pythonic, if .splitlines accepted an additional optional argument
where one could specify the end-of-line sequence to be used for
the splitting, defaulting to the OS's conventional sequence, and
then it split *strictly* on that sequence).

But for now this .splitlines will do nicely.

Thanks!

kynn
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-14 Thread Ethan Furman

MRAB wrote:

Ethan Furman wrote:


kj wrote:



Sometimes I want to split a string into lines, preserving the
end-of-line markers.  In Perl this is really easy to do, by splitting
on the beginning-of-line anchor:

  @lines = split /^/, $string;

But I can't figure out how to do the same thing with Python.  E.g.:



import re
re.split('^', 'spam\nham\neggs\n')



['spam\nham\neggs\n']


re.split('(?m)^', 'spam\nham\neggs\n')



['spam\nham\neggs\n']


bol_re = re.compile('^', re.M)
bol_re.split('spam\nham\neggs\n')



['spam\nham\neggs\n']

Am I doing something wrong?

kynn



As you probably noticed from the other responses:  No, you can't split 
on _and_ keep the splitby text.



You _can_ split and keep what you split on:

 >>> re.split("(x)", "abxcd")
['ab', 'x', 'cd']

You _can't_ split on a zero-width match:

 >>> re.split("(x*)", "abxcd")
['ab', 'x', 'cd']

but you can use re.sub to replace zero-width matches with something
that's not zero-width and then split on that (best with str.split):

 >>> re.sub("(x*)", "@", "abxcd")
'@a...@b@c...@d@'
 >>> re.sub("(x*)", "@", "abxcd").split("@")
['', 'a', 'b', 'c', 'd', '']


Wow!  I stand corrected, although I'm in danger of falling over from the 
dizziness!  :)


As impressive as that is, I don't think it does what the OP is looking 
for.  rurpy reminded us (or at least me ;) of .splitlines(), which seems 
to do exactly what the OP is looking for.  I do take some comfort that 
my little snippet works for more than newlines alone, although I'm not 
aware of any other use-cases.  :(


~Ethan~

Oh, hey, how about this?

re.compile('(^[^\n]*\n?)', re.M).findall('text\ntext\ntext)

Although this does give me an extra blank segment at the end... oh well.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-14 Thread MRAB

Ethan Furman wrote:

kj wrote:


Sometimes I want to split a string into lines, preserving the
end-of-line markers.  In Perl this is really easy to do, by splitting
on the beginning-of-line anchor:

  @lines = split /^/, $string;

But I can't figure out how to do the same thing with Python.  E.g.:



import re
re.split('^', 'spam\nham\neggs\n')


['spam\nham\neggs\n']


re.split('(?m)^', 'spam\nham\neggs\n')


['spam\nham\neggs\n']


bol_re = re.compile('^', re.M)
bol_re.split('spam\nham\neggs\n')


['spam\nham\neggs\n']

Am I doing something wrong?

kynn


As you probably noticed from the other responses:  No, you can't split 
on _and_ keep the splitby text.



You _can_ split and keep what you split on:

>>> re.split("(x)", "abxcd")
['ab', 'x', 'cd']

You _can't_ split on a zero-width match:

>>> re.split("(x*)", "abxcd")
['ab', 'x', 'cd']

but you can use re.sub to replace zero-width matches with something
that's not zero-width and then split on that (best with str.split):

>>> re.sub("(x*)", "@", "abxcd")
'@a...@b@c...@d@'
>>> re.sub("(x*)", "@", "abxcd").split("@")
['', 'a', 'b', 'c', 'd', '']
--
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-14 Thread rurpy
On Aug 14, 2:23 pm, kj  wrote:
> Sometimes I want to split a string into lines, preserving the
> end-of-line markers.  In Perl this is really easy to do, by splitting
> on the beginning-of-line anchor:
>
>   @lines = split /^/, $string;
>
> But I can't figure out how to do the same thing with Python.  E.g.:

Why not this?

>>> lines = 'spam\nham\neggs\n'.splitlines (True)
>>> lines
['spam\n', 'ham\n', 'eggs\n']
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-14 Thread Piet van Oostrum
> kj  (k) wrote:

>k> Sometimes I want to split a string into lines, preserving the
>k> end-of-line markers.  In Perl this is really easy to do, by splitting
>k> on the beginning-of-line anchor:

>k>   @lines = split /^/, $string;

>k> But I can't figure out how to do the same thing with Python.  E.g.:

> import re
> re.split('^', 'spam\nham\neggs\n')
>k> ['spam\nham\neggs\n']
> re.split('(?m)^', 'spam\nham\neggs\n')
>k> ['spam\nham\neggs\n']
> bol_re = re.compile('^', re.M)
> bol_re.split('spam\nham\neggs\n')
>k> ['spam\nham\neggs\n']

>k> Am I doing something wrong?

It says that in the doc of 're':
Note that split will never split a string on an empty pattern match. For
example: 
>>> re.split('x*', 'foo')
['foo']
>>> re.split("(?m)^$", "foo\n\nbar\n")
['foo\n\nbar\n']
-- 
Piet van Oostrum 
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: p...@vanoostrum.org
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-14 Thread MRAB

Gary Herron wrote:

kj wrote:

Sometimes I want to split a string into lines, preserving the
end-of-line markers.  In Perl this is really easy to do, by splitting
on the beginning-of-line anchor:

  @lines = split /^/, $string;

But I can't figure out how to do the same thing with Python.  E.g.:

  

import re
re.split('^', 'spam\nham\neggs\n')


['spam\nham\neggs\n']
  

re.split('(?m)^', 'spam\nham\neggs\n')


['spam\nham\neggs\n']
  

bol_re = re.compile('^', re.M)
bol_re.split('spam\nham\neggs\n')


['spam\nham\neggs\n']

Am I doing something wrong?
  

Just split on the EOL character:  the "\n":
re.split('\n', 'spam\nham\neggs\n')
['spam', 'ham', 'eggs', '']

The "^" and "$" characters do not match END-OF-LINE, but rather the  
END-OF-STRING, which was doing you no good.



With the MULTLINE flag "^" matches START-OF-LINE and "$" matches
END-OF-LINE or END-OF-STRING.

The current re module won't split on a zero-width match.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-14 Thread Ethan Furman

kj wrote:


Sometimes I want to split a string into lines, preserving the
end-of-line markers.  In Perl this is really easy to do, by splitting
on the beginning-of-line anchor:

  @lines = split /^/, $string;

But I can't figure out how to do the same thing with Python.  E.g.:



import re
re.split('^', 'spam\nham\neggs\n')


['spam\nham\neggs\n']


re.split('(?m)^', 'spam\nham\neggs\n')


['spam\nham\neggs\n']


bol_re = re.compile('^', re.M)
bol_re.split('spam\nham\neggs\n')


['spam\nham\neggs\n']

Am I doing something wrong?

kynn


As you probably noticed from the other responses:  No, you can't split 
on _and_ keep the splitby text.


Looks like you'll have to roll your own.

def splitat(text, sep):
result = [line + sep for line in text.split(sep)]
if result[-1] == sep:  # either remove extra element
   result.pop()
else:   # or extra sep from last element
result[-1] = result[-1][:-len(sep)]
return result
--
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-14 Thread Brian
On Fri, Aug 14, 2009 at 2:23 PM, kj  wrote:

>
>
> Sometimes I want to split a string into lines, preserving the
> end-of-line markers.  In Perl this is really easy to do, by splitting
> on the beginning-of-line anchor:
>
>  @lines = split /^/, $string;
>
> But I can't figure out how to do the same thing with Python.  E.g.:
>
> >>> import re
> >>> re.split('^', 'spam\nham\neggs\n')
> ['spam\nham\neggs\n']
> >>> re.split('(?m)^', 'spam\nham\neggs\n')
> ['spam\nham\neggs\n']
> >>> bol_re = re.compile('^', re.M)
> >>> bol_re.split('spam\nham\neggs\n')
> ['spam\nham\neggs\n']
>
> Am I doing something wrong?
>
> kynn
> --
> http://mail.python.org/mailman/listinfo/python-list
>

You shouldn't use a regular expression for that.

>>> from time import time
>>> start=time();'spam\nham\neggs\n'.split('\n');print time()-start;
['spam', 'ham', 'eggs', '']
4.6968460083e-05
>>> import re
>>> start=time();re.split(r'\n', 'spam\nham\neggs');print time()-start;
['spam', 'ham', 'eggs']
0.000284910202026
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-14 Thread Gabriel
On Fri, Aug 14, 2009 at 5:23 PM, kj wrote:
>
 import re
 re.split('^', 'spam\nham\neggs\n')
> ['spam\nham\neggs\n']
 re.split('(?m)^', 'spam\nham\neggs\n')
> ['spam\nham\neggs\n']
 bol_re = re.compile('^', re.M)
 bol_re.split('spam\nham\neggs\n')
> ['spam\nham\neggs\n']
>
> Am I doing something wrong?
>

Maybe this:

>>> import re
>>> te = 'spam\nham\neggs\n'
>>> pat = '\n'
>>> re.split(pat,te)
['spam', 'ham', 'eggs', '']

-- 
Kind Regards
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-14 Thread Tycho Andersen
On Fri, Aug 14, 2009 at 3:23 PM, kj wrote:
> [snip]
 import re
 re.split('^', 'spam\nham\neggs\n')
> ['spam\nham\neggs\n']
 re.split('(?m)^', 'spam\nham\neggs\n')
> ['spam\nham\neggs\n']
 bol_re = re.compile('^', re.M)
 bol_re.split('spam\nham\neggs\n')
> ['spam\nham\neggs\n']
>
> Am I doing something wrong?

Why not just:

>>> re.split(r'\n', 'spam\nham\neggs')

\t
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on '^' ?

2009-08-14 Thread Gary Herron

kj wrote:

Sometimes I want to split a string into lines, preserving the
end-of-line markers.  In Perl this is really easy to do, by splitting
on the beginning-of-line anchor:

  @lines = split /^/, $string;

But I can't figure out how to do the same thing with Python.  E.g.:

  

import re
re.split('^', 'spam\nham\neggs\n')


['spam\nham\neggs\n']
  

re.split('(?m)^', 'spam\nham\neggs\n')


['spam\nham\neggs\n']
  

bol_re = re.compile('^', re.M)
bol_re.split('spam\nham\neggs\n')


['spam\nham\neggs\n']

Am I doing something wrong?
  

Just split on the EOL character:  the "\n":
re.split('\n', 'spam\nham\neggs\n')
['spam', 'ham', 'eggs', '']

The "^" and "$" characters do not match END-OF-LINE, but rather the  
END-OF-STRING, which was doing you no good.



Gary Herron





kynn
  


-- 
http://mail.python.org/mailman/listinfo/python-list


Splitting on '^' ?

2009-08-14 Thread kj


Sometimes I want to split a string into lines, preserving the
end-of-line markers.  In Perl this is really easy to do, by splitting
on the beginning-of-line anchor:

  @lines = split /^/, $string;

But I can't figure out how to do the same thing with Python.  E.g.:

>>> import re
>>> re.split('^', 'spam\nham\neggs\n')
['spam\nham\neggs\n']
>>> re.split('(?m)^', 'spam\nham\neggs\n')
['spam\nham\neggs\n']
>>> bol_re = re.compile('^', re.M)
>>> bol_re.split('spam\nham\neggs\n')
['spam\nham\neggs\n']

Am I doing something wrong?

kynn
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading files, splitting on a delimiter and newlines.

2007-07-26 Thread Bruno Desthuilliers
[EMAIL PROTECTED] a écrit :
> On Jul 25, 8:46 am, [EMAIL PROTECTED] wrote:
> 
>>Hello,
>>
>>I have a situation where I have a file that contains text similar to:
>>
>>myValue1 = contents of value1
>>myValue2 = contents of value2 but
>>with a new line here
>>myValue3 = contents of value3
>>
>>My first approach was to open the file, use readlines to split the
>>lines on the "=" delimiter into a key/value pair (to be stored in a
>>dict).
>>
>>After processing a couple files I noticed its possible that a newline
>>can be present in the value as shown in myValue2.
>>
>>In this case its not an option to say remove the newlines if its a
>>"multi line" value as the value data needs to stay intact.
>>
>>I'm a bit confused as how to go about getting this to work.
>>
>>Any suggestions on an approach would be greatly appreciated!
> 
> 
> 
> 
> Check the length of the list returned from split; this allows
> your to append to the previously extracted value if need be.
> 
> import StringIO
> import pprint
> 
> buf = """\
> myValue1 = contents of value1
> myValue2 = contents of value2 but
>with a new line here
> myValue3 = contents of value3
> """
> 
> mockfile = StringIO.StringIO(buf)
> 
> record=dict()
> 
> for line in mockfile:
> kvpair = line.split('=', 2)

You want :
   kvpair = line.split('=', 1)

 >>> toto = "x = 42 = 33"
 >>> toto.split('=', 2)
['x ', ' 42 ', ' 33']


> if len(kvpair) == 2:
> key, value = kvpair
> record[key] = value
> else:
> record[key] += line

Also, this won't handle the case where the first line doesn't contain an 
'='  (NameError, name 'key' is not defined)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading files, splitting on a delimiter and newlines.

2007-07-26 Thread Bruno Desthuilliers
[EMAIL PROTECTED] a écrit :
> Hello,
> 
> I have a situation where I have a file that contains text similar to:
> 
> myValue1 = contents of value1
> myValue2 = contents of value2 but
> with a new line here
> myValue3 = contents of value3
> 
> My first approach was to open the file, use readlines to split the
> lines on the "=" delimiter into a key/value pair (to be stored in a
> dict).
> 
> After processing a couple files I noticed its possible that a newline
> can be present in the value as shown in myValue2.
> 
> In this case its not an option to say remove the newlines if its a
> "multi line" value as the value data needs to stay intact.
> 
> I'm a bit confused as how to go about getting this to work.
> 
> Any suggestions on an approach would be greatly appreciated!
> 

data = {}
key = None
for line in open('yourfile.txt'):
 line = line.strip()
 if not line:
 # skip empty lines
 continue
 if '=' in line:
 key, value = map(str.strip, line.split('=', 1))
 data[key] = value
 elif key is None:
 # first line without a '='
 raise ValueError("invalid format")
 else:
# multiline
data[key] += "\n" + line


print data
=> {'myValue3': 'contents of value3', 'myValue2': 'contents of value2 
but\nwith a new line here', 'myValue1': 'contents of value1'}

HTH
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading files, splitting on a delimiter and newlines.

2007-07-25 Thread Hendrik van Rooyen
: <[EMAIL PROTECTED]> Wrote:

> On Jul 25, 10:46 am, [EMAIL PROTECTED] wrote:
> > Hello,
> >
> > I have a situation where I have a file that contains text similar to:
> >
> > myValue1 = contents of value1
> > myValue2 = contents of value2 but
> > with a new line here
> > myValue3 = contents of value3
> >
> > My first approach was to open the file, use readlines to split the
> > lines on the "=" delimiter into a key/value pair (to be stored in a
> > dict).
> >
> > After processing a couple files I noticed its possible that a newline
> > can be present in the value as shown in myValue2.
> >
> > In this case its not an option to say remove the newlines if its a
> > "multi line" value as the value data needs to stay intact.
> >
> > I'm a bit confused as how to go about getting this to work.
> >
> > Any suggestions on an approach would be greatly appreciated!
> 
> I'm confused. You don't want the newline to be present, but you can't
> remove it because the data has to stay intact? If you don't want to
> change it, then what's the problem?

I think the OP's trouble is that the value he wants gets split up by the
newline at the end of the line when he uses readline().

One can try adding the single value to the previous value in the previous
key/value pair when the split does not yield two values - a bit hackish,
but given structured input data it might work.

- Hendrik

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading files, splitting on a delimiter and newlines.

2007-07-25 Thread chrispwd
On Jul 25, 7:56 pm, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:
> On Jul 25, 8:46 am, [EMAIL PROTECTED] wrote:
>
>
>
> > Hello,
>
> > I have a situation where I have a file that contains text similar to:
>
> > myValue1 = contents of value1
> > myValue2 = contents of value2 but
> > with a new line here
> > myValue3 = contents of value3
>
> > My first approach was to open the file, use readlines to split the
> > lines on the "=" delimiter into a key/value pair (to be stored in a
> > dict).
>
> > After processing a couple files I noticed its possible that a newline
> > can be present in the value as shown in myValue2.
>
> > In this case its not an option to say remove the newlines if its a
> > "multi line" value as the value data needs to stay intact.
>
> > I'm a bit confused as how to go about getting this to work.
>
> > Any suggestions on an approach would be greatly appreciated!
>
> Check the length of the list returned from split; this allows
> your to append to the previously extracted value if need be.
>
> import StringIO
> import pprint
>
> buf = """\
> myValue1 = contents of value1
> myValue2 = contents of value2 but
>with a new line here
> myValue3 = contents of value3
> """
>
> mockfile = StringIO.StringIO(buf)
>
> record=dict()
>
> for line in mockfile:
> kvpair = line.split('=', 2)
> if len(kvpair) == 2:
> key, value = kvpair
> record[key] = value
> else:
> record[key] += line
>
> pprint.pprint(record)
>
> # lstrip() to remove newlines if needed ...
>
> --
> Hope this helps,
> Steven

Great thank you! That was the logic I was looking for.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading files, splitting on a delimiter and newlines.

2007-07-25 Thread [EMAIL PROTECTED]
On Jul 25, 8:46 am, [EMAIL PROTECTED] wrote:
> Hello,
>
> I have a situation where I have a file that contains text similar to:
>
> myValue1 = contents of value1
> myValue2 = contents of value2 but
> with a new line here
> myValue3 = contents of value3
>
> My first approach was to open the file, use readlines to split the
> lines on the "=" delimiter into a key/value pair (to be stored in a
> dict).
>
> After processing a couple files I noticed its possible that a newline
> can be present in the value as shown in myValue2.
>
> In this case its not an option to say remove the newlines if its a
> "multi line" value as the value data needs to stay intact.
>
> I'm a bit confused as how to go about getting this to work.
>
> Any suggestions on an approach would be greatly appreciated!



Check the length of the list returned from split; this allows
your to append to the previously extracted value if need be.

import StringIO
import pprint

buf = """\
myValue1 = contents of value1
myValue2 = contents of value2 but
   with a new line here
myValue3 = contents of value3
"""

mockfile = StringIO.StringIO(buf)

record=dict()

for line in mockfile:
kvpair = line.split('=', 2)
if len(kvpair) == 2:
key, value = kvpair
record[key] = value
else:
record[key] += line

pprint.pprint(record)

# lstrip() to remove newlines if needed ...

--
Hope this helps,
Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading files, splitting on a delimiter and newlines.

2007-07-25 Thread John Machin
On Jul 26, 3:08 am, Stargaming <[EMAIL PROTECTED]> wrote:
> On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote:
> > On Jul 25, 10:46 am, [EMAIL PROTECTED] wrote:
> >> Hello,
>
> >> I have a situation where I have a file that contains text similar to:
>
> >> myValue1 = contents of value1
> >> myValue2 = contents of value2 but
> >> with a new line here
> >> myValue3 = contents of value3
>
> >> My first approach was to open the file, use readlines to split the
> >> lines on the "=" delimiter into a key/value pair (to be stored in a
> >> dict).
>
> >> After processing a couple files I noticed its possible that a newline
> >> can be present in the value as shown in myValue2.
>
> >> In this case its not an option to say remove the newlines if its a
> >> "multi line" value as the value data needs to stay intact.
>
> >> I'm a bit confused as how to go about getting this to work.
>
> >> Any suggestions on an approach would be greatly appreciated!
>
> > I'm confused. You don't want the newline to be present, but you can't
> > remove it because the data has to stay intact? If you don't want to
> > change it, then what's the problem?
>
> > Mike
>
> It's obviously that simple line-by-line filtering won't handle multi-line
> statements.
>
> You could solve that by saving the last item you added something to and,
> if the line currently handles doesn't look like an assignment, append it
> to this item. You might run into problems with such data:
>
>   foo = modern maths
>   proved that 1 = 1
>   bar = single
>
> If your dataset always has indendation on subsequent lines, you might use
> this. Or if the key's name is always just one word.
>

My take: all of the above, plus: Given that you want to extract stuff
of the form  =  I'd suggest developing a fairly precise
regular expression for LHS, maybe even for RHS, and trying this on as
many of these files as you can.

Why an RE for RHS? Consider:

foo = somebody said "I think that
REs = trouble
maybe_better = pyparsing"

:-)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading files, splitting on a delimiter and newlines.

2007-07-25 Thread Stargaming
On Wed, 25 Jul 2007 09:16:26 -0700, kyosohma wrote:

> On Jul 25, 10:46 am, [EMAIL PROTECTED] wrote:
>> Hello,
>>
>> I have a situation where I have a file that contains text similar to:
>>
>> myValue1 = contents of value1
>> myValue2 = contents of value2 but
>> with a new line here
>> myValue3 = contents of value3
>>
>> My first approach was to open the file, use readlines to split the
>> lines on the "=" delimiter into a key/value pair (to be stored in a
>> dict).
>>
>> After processing a couple files I noticed its possible that a newline
>> can be present in the value as shown in myValue2.
>>
>> In this case its not an option to say remove the newlines if its a
>> "multi line" value as the value data needs to stay intact.
>>
>> I'm a bit confused as how to go about getting this to work.
>>
>> Any suggestions on an approach would be greatly appreciated!
> 
> I'm confused. You don't want the newline to be present, but you can't
> remove it because the data has to stay intact? If you don't want to
> change it, then what's the problem?
> 
> Mike

It's obviously that simple line-by-line filtering won't handle multi-line 
statements.

You could solve that by saving the last item you added something to and, 
if the line currently handles doesn't look like an assignment, append it 
to this item. You might run into problems with such data:

  foo = modern maths
  proved that 1 = 1
  bar = single

If your dataset always has indendation on subsequent lines, you might use 
this. Or if the key's name is always just one word.

HTH,
Stargaming
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading files, splitting on a delimiter and newlines.

2007-07-25 Thread kyosohma
On Jul 25, 10:46 am, [EMAIL PROTECTED] wrote:
> Hello,
>
> I have a situation where I have a file that contains text similar to:
>
> myValue1 = contents of value1
> myValue2 = contents of value2 but
> with a new line here
> myValue3 = contents of value3
>
> My first approach was to open the file, use readlines to split the
> lines on the "=" delimiter into a key/value pair (to be stored in a
> dict).
>
> After processing a couple files I noticed its possible that a newline
> can be present in the value as shown in myValue2.
>
> In this case its not an option to say remove the newlines if its a
> "multi line" value as the value data needs to stay intact.
>
> I'm a bit confused as how to go about getting this to work.
>
> Any suggestions on an approach would be greatly appreciated!

I'm confused. You don't want the newline to be present, but you can't
remove it because the data has to stay intact? If you don't want to
change it, then what's the problem?

Mike

-- 
http://mail.python.org/mailman/listinfo/python-list


Reading files, splitting on a delimiter and newlines.

2007-07-25 Thread chrispwd
Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on a word

2005-07-14 Thread qwweeeit
Hi Bernhard,
firstly you must excuse my English ("angry" is a little ...strong, but
my vocabulary is limited). I hope that the experts keep on helping us
newbie.
Also if I am a newbie (in Python), I disagree with you: my solution
(with the help of Joe) answers to the problem of splitting a string
using a delimiter of more than one character (sometimes a word as
delimiter, but it is not required).
The code I supplied can be misleading because is centered in web
parsing, but my request is more general (Next time I will only make the
question without examples!)
If I were a professional programmer I could agree with you and the
"Batteries included" concept and all the other considerations
("off-the-shelf solutions" and ...not reinventing the wheel).
Also the terrific example you supply in order to caution me not to
follow dully (found in the dictionary) the "simple & short" concept,
doesn't apply to me (too complicated!).
I am so far from a real programmer that when an error occurs, I use
try/except (if they solve the problem) without caring of the sources of
the mistake, ...EAFP!).
So I don't care too much of possible future mistakes (also if the code
takes into account capital letters).
For the specific case I mentioned, actually if the closing tag ">" is
missing perhaps I obtain wrong results... I will worry when necessary
(also if the Murphy law...).
Bye.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on a word

2005-07-14 Thread Bernhard Holzmayer
[EMAIL PROTECTED] wrote:

> Bernard... don't get angry, but I prefer the solution of Joe.  

Oh. If I got angry in such a case, I would have stopped responding to such
posts long ago 
You know the background... and you'll have to bear the consequences. ;-) 

> ... 
> for me "pythonic" means simple and short (I may be wrong...).

It's your definition, isn't it? 
One of the most important advantages of Python (for me!) besides its
readability is that it comes with "Batteries included", which means, that I
can benefit of the work others did before, and that I can rely on its
quality.

The solution which I proposed is nothing but the test code from htmllib,
stripped down to the absolut minimum, enriched with the print command
to show the anchor list.

If I had to write production-level code of your sort, I'd take such an
off-the-shelf solution, because it minimizes the risk of failures.

Think only of such issues like these:
- does your code find a tag like  or references with/without " ...?
- does it survive ill-coded html after all?

I've made the experience that it's usually better to rely on such
"library" code than to reinvent the wheel.

There's often a reason to take another approach.
I'd agree that a simple and short solution is fascinating.
However, every simple and short solution should be readable.
As a terrific example, here's a very tiny piece of code,
which does nothing but calculate the prime numbers up to 1000:

print filter(None,map(lambda y:y*reduce(lambda x,y:x*y!=0,
   map(lambda x,y=y:y%x,range(2,int(pow(y,0.5)+1))),1),
   range(2,1000)))

- simple (depends on your familiarity with stuff like map and lambda) 
- short (compared with different solutions) 
- and veeeyyy pythonic!

Bernhard

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on a word

2005-07-14 Thread qwweeeit
Hi all,
thanks for your contributions. To Robert Kern I can replay that I know
BeautifulSoap, but mine wanted to be a "generalization" (only
incidentally used in a web parsing application). The fact is that,
beeing a "macho newbie" programmer (the "macho" is from Steven
D'Aprano), I wanted to show how beaufiful solutions I can find...
Luckily there is Joe who shows me that he most of my "beautiful" code
(working, of course!) can be replaced by:
list=p.split(s)
Bernard... don't get angry, but I prefer the solution of Joe. It is
more general, and, besides that, for me "pythonic" means simple and
short (I may be wrong...).
By the way, I have found an alternative solution to the problem of
lists "unique", without sorting, but non beeing enough "macho"...
Bye.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on a word

2005-07-14 Thread Bernhard Holzmayer
[EMAIL PROTECTED] wrote:

> Hi all,
> I am writing a script to visualize (and print)
> the web references hidden in the html files as:
> ' underlined reference'
> Optimizing my code, I found that an essential step is:
> splitting on a word (in this case 'href').
> 
> I am asking if there is some alternative (more pythonic...):

Sure. The htmllib module provides HTMLparser.
Here's an example, run it with your HTML file as argument
and you'll see a list of all href's in the document.

#
#!/usr/bin/python
import htmllib

def test():
import sys, formatter

file = sys.argv[1]
f = open(file, 'r')
data = f.read()
f.close()

f = formatter.NullFormatter()
p = htmllib.HTMLParser(f)
p.feed(data)

for a_link in p.anchorlist:
print a_link

p.close()

test()
#

I'm sure that this is far more Pythonic!

Bernhard
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on a word

2005-07-13 Thread Joe
# string s simulating an html file
s='ffy: ytrty python fyt wx  dtrtf'
p=re.compile(r'\bhref\b',re.I)

list=p.split(s)  #<   gets you your final list.

good luck,

Joe

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on a word

2005-07-13 Thread Steven D'Aprano
On Wed, 13 Jul 2005 06:19:54 -0700, qwweeeit wrote:

> Hi all,
> I am writing a script to visualize (and print)
> the web references hidden in the html files as:
> ' underlined reference'
> Optimizing my code, 

[red rag to bull]
Because it was too slow? Or just to prove what a macho programmer you are?

Is your code even working yet? If it isn't working, you shouldn't be
trying to optimizing buggy code.


> I found that an essential step is:
> splitting on a word (in this case 'href').

Then just do it:

py> ' underlined reference'.split('href')
[' underlined reference']

If you are concerned about case issues, you can either convert the
entire HTML file to lowercase, or you might write a case-insensitive
regular expression to replace any "href" regardless of case with the
lowercase version.

[snip]

> To be sure as delimiter I choose chr(127)
> which surely is not present in the html file.

I wouldn't bet my life on that. I've found some weird characters in HTML
files.


-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Splitting on a word

2005-07-13 Thread Robert Kern
[EMAIL PROTECTED] wrote:
> Hi all,
> I am writing a script to visualize (and print)
> the web references hidden in the html files as:
> ' underlined reference'
> Optimizing my code, I found that an essential step is:
> splitting on a word (in this case 'href').
> 
> I am asking if there is some alternative (more pythonic...):

For *this* particular task, certainly. It begins with

   import BeautifulSoup

The rest is left as a (brief) exercise for the reader.  :-)

As for the more general task of splitting strings using regular 
expressions, see re.split().

-- 
Robert Kern
[EMAIL PROTECTED]

"In the fields of hell where the grass grows high
  Are the graves of dreams allowed to die."
   -- Richard Harter

-- 
http://mail.python.org/mailman/listinfo/python-list


Splitting on a word

2005-07-13 Thread qwweeeit
Hi all,
I am writing a script to visualize (and print)
the web references hidden in the html files as:
' underlined reference'
Optimizing my code, I found that an essential step is:
splitting on a word (in this case 'href').

I am asking if there is some alternative (more pythonic...):

# SplitMultichar.py

import re

# string s simulating an html file
s='ffy: ytrty python fyt wx  dtrtf'
p=re.compile(r'\bhref\b',re.I)

lHref=p.findall(s)  # lHref=['href','HREF']
# for normal html files the lHref list has more elements
# (more web references)

c='~' # char to be used as delimiter
# c=chr(127) # char to be used as delimiter
for i in lHref:
s=s.replace(i,c)

# s ='ffy: ytrty python fyt wx  dtrtf'

list=s.split(c)
# list=['ffy: ytrty python fyt wx  dtrtf']
#=-

If you save the original s string to xxx.html, any browser
can visualize it.
To be sure as delimiter I choose chr(127)
which surely is not present in the html file.
Bye.

-- 
http://mail.python.org/mailman/listinfo/python-list