Re: string processing - some problems whenever I have to parse a more complex string

Terry Reedy Tue, 21 Oct 2014 15:06:13 -0700

On 10/21/2014 10:32 AM, CWr wrote:


Hello together,

currently I have to parse a string in an atomic way. Normally - in this case 
too - I have a counter variable to keep the current position inside the string. 
So far, I think this is the most flexible solution to do some lookaround's 
inside the string if necessary. Subroutines will be feed by the underlying data 
and the current position. A subroutine returns a tuple of the new position and 
the result. But I would like process subroutines with the same flexibillity 
(slicing/lookaround) but without returning the new position every again.

Is there any implementation like C++ StringPiece class?

I am going to guess that this is a string view class that encapsulates apiece of an underlying class. Otherwise there is no point.

A view class depends on a primary, independently accessible class forits data. There are two main categories. A subview gives the primaryclass interface to a part of the primary data. Numpy had array subviewsan I presume you are talking about string subviews here. An altviewclass gives an alternative interface to the primary data. Dict viewsare examples.

If the primary object is mutable, one reason to use a view instead of acopy is to keep the data for two objects synchronized. This does notapply to strings.

Another reason is to save memory space. The downside is that theprimary data cannot be erased until *both* objects are deleted.Moreover, if the primary data is small or the subview data is a smallfraction of the primary data, the memory saving is small. So smallsubviews that persist after the primary object may end up costing morememory than they save. This is one reason Python does not have stringsubview. The numpy array view use case is large subarrays of largearrays that have to persist through a calculation anyway.

Another reason Python lack sequence subviews is that the extra dataneeded for a contiguous slice are only the start and stop indexes.These can easily be manipulated directly without wrapping them in aclass. And anyone who does want a method interface can easily create aclass to their liking.


To answer your question, I tried
https://pypi.python.org/pypi?%3Aaction=search&term=string+view&submit=search

and did not find anything. 'view' matches the generic use of 'view', aswell as 'views', 'viewed', 'viewer', 'review', and 'preview'.


The third answer here
https://stackoverflow.com/questions/10085568/slices-to-immutable-strings-by-reference-and-not-copy

has a StringView class that could be modifed to work on 3.x by removingthe unneeded use of buffer.


> Or something like the following behavior:

s = StringSlice('abcdef')


s = 'abcdef'
a, b = 0, len(s)  # s start, s end

StringSlice('abcdef') at xxx

s[0]


s[a]

'a'

s.chop(1) # chop the first item
s[0] # 'b' is the new first item


a += 1
s[a]

'b'

s[:2]


s[a:a+2]

'bc'

s.chop(-1) # chop the last item
s[-1]


b -= 1
s[b-1]

'e'

s[1:]


s[a+1:b]

'cde'

while s[0] != 'e':

        s.chop(1)

>>>> s[0]

while s[a] != 'e':
    a += 1
s[a]

'e'

s.startswith('e')


s[a:b].startswith('e')

True

s.isdigit()


s[a:b].isdigit()

False

Subroutines could chop the number of processed items internally if no error 
occours.

Another possibillty will be to chop the current item manually. But I don't know 
how efficient this is in case of large strings.

while string:

        c = string[0]
        # process it ...
        string = string[1:]

This is extremely bad as it replaces the O(n) processing (below) withO(n*n) processing. In general, the right way to linearly process anyiterable is


for item in iterable:
  process(c)

or sometimes

for index, item in enumerate(iterable):
  process(index, item)

or even, for sequences, (but not when the first option above suffices)

for index in range(len(sequence)):
  process(index, sequence)

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: string processing - some problems whenever I have to parse a more complex string

Reply via email to