Hi James,

The RDKit does not have a full-featured CSV parser, writing such a thing is
a non-trivial task. If you need to support general CSV, I'd suggest using
pandas or python's builtin csv module... it seems like overkill, but
dealing with all the oddness that can show up in CSVs is really not easy.

Best,
-greg


On Mon, Jan 10, 2022 at 11:15 AM James Wallace <jeawall...@gmail.com> wrote:

> As the subject suggests, I'm trying to find a universal solution for
> reading CSVs via the SmilesMolSupplier (as the input setup could be single
> column or multiple column, using the pandas tools for interconversion is
> overkill)
>
> The general structure I use for analysing the CSV is:
>
>
> with open(chem_file_name, "r") as csv_upload_file:
>             first_line = csv_upload_file.readline()
>             dialect = sniffer.sniff(first_line)
>             has_header = sniffer.has_header(first_line)
>             csv_upload_file.close()
>
> supplier = Chem.SmilesMolSupplier(chem_file_name,
> delimiter=str(dialect.delimiter), smilesColumn=smi_col_header,
> nameColumn=-1, titleLine=has_header)
>
> If I use a CSV without quoted data,, this is fine, I can autodetect the
> delimiter, the column header is loaded in by the rest of my workflow,
> everything else is worked out through the CSV sniffer. However, where it is
> quoted data, the actual parsing will fail because of the quotemarks.
>
> [10:09:56] SMILES Parse Error: syntax error for input: '"C1=CC=CC=C1"'
> [10:09:56] ERROR: Smiles parse error on line 1
>
> Is there some easy way of handling this, or do I have to mandate not using
> quoting of data in the CSV generation?
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to