On 10/21/2014 10:32 AM, CWr wrote:

Hello together,

currently I have to parse a string in an atomic way. Normally - in this case 
too - I have a counter variable to keep the current position inside the string. 
So far, I think this is the most flexible solution to do some lookaround's 
inside the string if necessary. Subroutines will be feed by the underlying data 
and the current position. A subroutine returns a tuple of the new position and 
the result. But I would like process subroutines with the same flexibillity 
(slicing/lookaround) but without returning the new position every again.

Is there any implementation like C++ StringPiece class?

I am going to guess that this is a string view class that encapsulates a piece of an underlying class. Otherwise there is no point.

A view class depends on a primary, independently accessible class for its data. There are two main categories. A subview gives the primary class interface to a part of the primary data. Numpy had array subviews an I presume you are talking about string subviews here. An altview class gives an alternative interface to the primary data. Dict views are examples.

If the primary object is mutable, one reason to use a view instead of a copy is to keep the data for two objects synchronized. This does not apply to strings.

Another reason is to save memory space. The downside is that the primary data cannot be erased until *both* objects are deleted. Moreover, if the primary data is small or the subview data is a small fraction of the primary data, the memory saving is small. So small subviews that persist after the primary object may end up costing more memory than they save. This is one reason Python does not have string subview. The numpy array view use case is large subarrays of large arrays that have to persist through a calculation anyway.

Another reason Python lack sequence subviews is that the extra data needed for a contiguous slice are only the start and stop indexes. These can easily be manipulated directly without wrapping them in a class. And anyone who does want a method interface can easily create a class to their liking.

To answer your question, I tried
https://pypi.python.org/pypi?%3Aaction=search&term=string+view&submit=search

and did not find anything. 'view' matches the generic use of 'view', as well as 'views', 'viewed', 'viewer', 'review', and 'preview'.

The third answer here
https://stackoverflow.com/questions/10085568/slices-to-immutable-strings-by-reference-and-not-copy
has a StringView class that could be modifed to work on 3.x by removing the unneeded use of buffer.

> Or something like the following behavior:

s = StringSlice('abcdef')

s = 'abcdef'
a, b = 0, len(s)  # s start, s end

s
StringSlice('abcdef') at xxx
s[0]

s[a]

'a'
s.chop(1) # chop the first item
s[0] # 'b' is the new first item

a += 1
s[a]

'b'
s[:2]

s[a:a+2]

'bc'
s.chop(-1) # chop the last item
s[-1]

b -= 1
s[b-1]

'e'
s[1:]

s[a+1:b]

'cde'
while s[0] != 'e':
        s.chop(1)
>>>> s[0]

while s[a] != 'e':
    a += 1
s[a]

'e'
s.startswith('e')

s[a:b].startswith('e')

True
s.isdigit()

s[a:b].isdigit()

False

Subroutines could chop the number of processed items internally if no error 
occours.

Another possibillty will be to chop the current item manually. But I don't know 
how efficient this is in case of large strings.

while string:
        c = string[0]
        # process it ...
        string = string[1:]

This is extremely bad as it replaces the O(n) processing (below) with O(n*n) processing. In general, the right way to linearly process any iterable is

for item in iterable:
  process(c)

or sometimes

for index, item in enumerate(iterable):
  process(index, item)

or even, for sequences, (but not when the first option above suffices)

for index in range(len(sequence)):
  process(index, sequence)

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to