Hi, I am trying to removing extra quotes from a large set of strings (a list of strings), so for each original string, it looks like,
"""str_value1"",""str_value2"",""str_value3"",1,""str_value4""" I like to remove the start and end quotes and extra pairs of quotes on each string value, so the result will look like, "str_value1","str_value2","str_value3",1,"str_value4" and then join each string by a new line. I have tried the following code, for line in str_lines[1:]: strip_start_end_quotes = line[1:-1] splited_line_rem_quotes = strip_start_end_quotes.replace('\"\"', '"') str_lines[str_lines.index(line)] = splited_line_rem_quotes for_pandas_new_headers_str = '\n'.join(splited_lines) but it is really slow (running for ages) if the list contains over 1 million string lines. I am thinking about a fast way to do that. I also tried to multiprocessing this task by def preprocess_data_str_line(data_str_lines): """ :param data_str_lines: :return: """ for line in data_str_lines: strip_start_end_quotes = line[1:-1] splited_line_rem_quotes = strip_start_end_quotes.replace('\"\"', '"') data_str_lines[data_str_lines.index(line)] = splited_line_rem_quotes return data_str_lines def multi_process_prepcocess_data_str(data_str_lines): """ :param data_str_lines: :return: """ # if cpu load < 25% and 4GB of ram free use 3 cores # if cpu load < 70% and 4GB of ram free use 2 cores cores_to_use = how_many_core() data_str_blocks = slice_list(data_str_lines, cores_to_use) for block in data_str_blocks: # spawn processes for each data string block assigned to every cpu core p = multiprocessing.Process(target=preprocess_data_str_line, args=(block,)) p.start() but I don't know how to concatenate the results back into the list so that I can join the strings in the list by new lines. So, ideally, I am thinking about using multiprocessing + a fast function to preprocessing each line to speed up the whole process. cheers -- https://mail.python.org/mailman/listinfo/python-list