rahulra...@gmail.com writes: > Hi All, > > I have a string which looks like > > aaaaa,bbbbb,ccccc "4873898374", ddddd, eeeeee "3343,23,23,5,,5,45", fffff > "5546,3434,345,34,34,5,34,543,7" > > It is comma saperated string, but some of the fields have a double > quoted string as part of it (and that double quoted string can have > commas). Above string have only 6 fields. First is aaaaa, second is > bbbbb and last is fffff "5546,3434,345,34,34,5,34,543,7". How can I > split this string in its fields using regular expression ? or even if > there is any other way to do this, please speak out.
If you have any control over the source of this data, try to change the source so that it writes proper CSV. Then you can use the csv module to parse the data. As it is, csv.reader failed me. Perhaps someone else knows how it should be parameterized to deal with this? len(next(csv.reader(io.StringIO(s)))) == 20 len(next(csv.reader(io.StringIO(s), doublequote = False))) == 20 Here's a regex solution that assumes that there is something in a field before the doublequoted part, then at most one doublequoted part and nothing after the doublequoted part. len(re.findall(r'([^",]+(?:"[^"]*")?)', s)) == 6 re.findall(r'([^",]+(?:"[^"]*")?)', s) ['aaaaa', 'bbbbb', 'ccccc "4873898374"', ' ddddd', ' eeeeee "3343,23,23,5,,5,45"', ' fffff "5546,3434,345,34,34,5,34,543,7"'] The outermost parentheses in the pattern make the whole pattern a capturing group. They are redundant above (with re.findall) but important in the following alternative (with re.split). re.split(r'([^",]+(?:"[^"]*")?)', s) ['', 'aaaaa', ',', 'bbbbb', ',', 'ccccc "4873898374"', ',', ' ddddd', ',', ' eeeeee "3343,23,23,5,,5,45"', ',', ' fffff "5546,3434,345,34,34,5,34,543,7"', ''] This splits the string with the pattern that matches the actual data. With the capturing group it also returns the actual data. One could then check that the assumptions hold and every other value is just a comma. I would make that check and throw an exception on failure. -- https://mail.python.org/mailman/listinfo/python-list