makes sense - i'd make this as consistent as to_json / from_json as possible.
how would this work in sql? i.e. how would passing options in work? -- excuse the brevity and lower case due to wrist injury On Sat, Sep 15, 2018 at 2:58 AM Maxim Gekk <maxim.g...@databricks.com> wrote: > Hi All, > > I would like to propose new function from_csv() for parsing columns > containing strings in CSV format. Here is my PR: > https://github.com/apache/spark/pull/22379 > > An use case is loading a dataset from an external storage, dbms or systems > like Kafka to where CSV content was dumped as one of columns/fields. Other > columns could contain related information like timestamps, ids, sources of > data and etc. The column with CSV strings can be parsed by existing method > csv() of DataFrameReader but in that case we have to "clean up" dataset > and remove other columns since the csv() method requires Dataset[String]. > Joining back result of parsing and original dataset by positions is > expensive and not convenient. Instead users parse CSV columns by string > functions. The approach is usually error prone especially for quoted values > and other special cases. > > The proposed in the PR methods should make a better user experience in > parsing CSV-like columns. Please, share your thoughts. > > -- > > Maxim Gekk > > Technical Solutions Lead > > Databricks Inc. > > maxim.g...@databricks.com > > databricks.com > > <http://databricks.com/> >