Re: Cutting slices

2023-03-06 Thread Christian Gollwitzer

Am 05.03.23 um 23:43 schrieb Stefan Ram:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. 


OK, so if you want to use an RE for splitting, can you not use 
re.split() ? It basically works like the built-in splitting in AWK


>>> s='alphaAbetaBgamma'
>>> import re
>>> re.split(r'A|B|C', s)
['alpha', 'beta', 'gamma']
>>>


Christian
--
https://mail.python.org/mailman/listinfo/python-list


RE: Cutting slices

2023-03-05 Thread avi.e.gross
I am not commenting on the technique or why it is chosen just the part where
the last search looks for a non-existent period:

s = 'alpha.beta.gamma'
...
s[ 11: s.find( '.', 11 )]

What should "find" do if it hits the end of a string without finding the
period you claim is a divider?

Could that be why gamma got truncated?

Unless you can arrange for a terminal period, maybe you can reconsider the
approach.


-Original Message-
From: Python-list  On
Behalf Of aapost
Sent: Sunday, March 5, 2023 6:00 PM
To: python-list@python.org
Subject: Re: Cutting slices

On 3/5/23 17:43, Stefan Ram wrote:
>The following behaviour of Python strikes me as being a bit
>"irregular". A user tries to chop of sections from a string,
>but does not use "split" because the separator might become
>more complicated so that a regular expression will be required
>to find it. But for now, let's use a simple "find":
>
> |>>> s = 'alpha.beta.gamma'
> |>>> s[ 0: s.find( '.', 0 )]
> |'alpha'
> |>>> s[ 6: s.find( '.', 6 )]
> |'beta'
> |>>> s[ 11: s.find( '.', 11 )]
> |'gamm'
> |>>>
> 
>. The user always inserted the position of the previous find plus
>one to start the next "find", so he uses "0", "6", and "11".
>But the "a" is missing from the final "gamma"!
>
>And it seems that there is no numerical value at all that
>one can use for "n" in "string[ 0: n ]" to get the whole
>string, isn't it?
> 
> 

I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16] 
work ... as well as string[11:324242]... lol..
-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-05 Thread Greg Ewing via Python-list

On 6/03/23 11:43 am, Stefan Ram wrote:

   A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it.


What's wrong with re.split() in that case?

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-05 Thread MRAB

On 2023-03-06 00:28, dn via Python-list wrote:

On 06/03/2023 11.59, aapost wrote:

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":
|>>> s = 'alpha.beta.gamma'
|>>> s[ 0: s.find( '.', 0 )]
|'alpha'
|>>> s[ 6: s.find( '.', 6 )]
|'beta'
|>>> s[ 11: s.find( '.', 11 )]
|'gamm'
|>>>

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
   And it seems that there is no numerical value at all that
   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?




I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16] 
work ... as well as string[11:324242]... lol..


To expand on the above, answering the OP's second question: the numeric
value is len( s ).

If the repetitive process is required, try a loop like:

  >>> start_index = 11 #to cure the issue-raised

  >>> try:
... s[ start_index:s.index( '.', start_index ) ]
... except ValueError:
... s[ start_index:len( s ) ]
...
'gamma'


Somewhat off-topic, but...

When there was a discussion about a None-coalescing operator, I thought 
that it would've been nice if .find and .rfind returned None instead of -1.


There have been times when I've wanted to find the next space (or 
whatever) and have it return the length of the string if absent. That 
could've been accomplished with:


s.find(' ', pos) ?? len(s)

Other times I've wanted it to return -1. That could've been accomplished 
with:


s.find(' ', pos) ?? -1

(There's a place in the re module where .rfind returning -1 is just the 
right value.)


In this instance, slicing with None as the end is just what's wanted.

Ah, well...
--
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-05 Thread Rob Cliffe via Python-list



On 05/03/2023 22:59, aapost wrote:

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":
   |>>> s = 'alpha.beta.gamma'
|>>> s[ 0: s.find( '.', 0 )]
|'alpha'
|>>> s[ 6: s.find( '.', 6 )]
|'beta'
|>>> s[ 11: s.find( '.', 11 )]
|'gamm'
|>>>

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
      And it seems that there is no numerical value at all that
   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?





The final `find` returns -1 because there is no separator after 'gamma'.
So you are asking for
    s[ 11 : -1]
which correctly returns 'gamm'.
You need to test for this condition.
Alternatively you could ensure that there is a final separator:
    s = 'alpha.beta.gamma.'
but you would still need to test when the string was exhausted.
Best wishes
Rob Cliffe
--
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-05 Thread dn via Python-list

On 06/03/2023 11.59, aapost wrote:

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":
|>>> s = 'alpha.beta.gamma'
|>>> s[ 0: s.find( '.', 0 )]
|'alpha'
|>>> s[ 6: s.find( '.', 6 )]
|'beta'
|>>> s[ 11: s.find( '.', 11 )]
|'gamm'
|>>>

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
   And it seems that there is no numerical value at all that
   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?




I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16] 
work ... as well as string[11:324242]... lol..


To expand on the above, answering the OP's second question: the numeric 
value is len( s ).


If the repetitive process is required, try a loop like:

>>> start_index = 11   #to cure the issue-raised

>>> try:
... s[ start_index:s.index( '.', start_index ) ]
... except ValueError:
... s[ start_index:len( s ) ]
...
'gamma'


However, if the objective is to split, then use the function built for 
the purpose:


>>> s.split( "." )
['alpha', 'beta', 'gamma']

(yes, the OP says this won't work - but doesn't show why)


If life must be more complicated, but the next separator can be 
predicted, then its close-relative is partition().
NB can use both split() and partition() on the sub-strings produced by 
an earlier split() or ... ie there may be no reason to work strictly 
from left to right
- can't really help with this because the information above only shows 
multiple "." characters, and not how multiple separators might be 
interpreted.



A straight-line approach might be to use maketrans() and translate() to 
convert all the separators to a single character, eg white-space, which 
can then be split using any of the previously-mentioned methods.



If the problem is sufficiently complicated and the OP is prepared to go 
whole-hog, then PSL's tokenize library or various parser libraries may 
be worth consideration...


--
Regards,
=dn
--
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-05 Thread aapost

On 3/5/23 17:43, Stefan Ram wrote:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. But for now, let's use a simple "find":
   
|>>> s = 'alpha.beta.gamma'

|>>> s[ 0: s.find( '.', 0 )]
|'alpha'
|>>> s[ 6: s.find( '.', 6 )]
|'beta'
|>>> s[ 11: s.find( '.', 11 )]
|'gamm'
|>>>

   . The user always inserted the position of the previous find plus
   one to start the next "find", so he uses "0", "6", and "11".
   But the "a" is missing from the final "gamma"!
   
   And it seems that there is no numerical value at all that

   one can use for "n" in "string[ 0: n ]" to get the whole
   string, isn't it?




I would agree with 1st part of the comment.

Just noting that string[11:], string[11:None], as well as string[11:16] 
work ... as well as string[11:324242]... lol..

--
https://mail.python.org/mailman/listinfo/python-list