[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader
[ https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-11243: --- Fix Version/s: (was: 5.0.0) 6.0.0 > [C++] Parse time32 from string and infer in CSV reader > -- > > Key: ARROW-11243 > URL: https://issues.apache.org/jira/browse/ARROW-11243 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 2.0.0 > Environment: Ubuntu 18.04, R 4.0.3 >Reporter: Jared Lander >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 6.0.0 > > Attachments: sampletimedata.csv > > Time Spent: 3h > Remaining Estimate: 0h > > When reading a CSV with read_csv_arrow() with date types and time types, the > dates are read as datetimes rather than dates and times are read as > characters rather than time. > The first problem can be fixed by supplying date32() to schema(), though > better inference would be nice. However, supplying time32() to schema() > causes an error. > Here is a sample dataset, also attached. > date,time,reading > 2021-01-01,00:00:00,67.8 > 2021-01-01,00:00:00,72.4 > 2021-01-01,00:00:00,63.1 > 2021-01-01,00:05:00,67.8 > Reading with readr::read_csv() results in a tibble with three columns: date, > time, dbl, as expected. > > {code:r} > samp_readr <- readr::read_csv('sampledata.csv') > samp_readr > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00'00"67.8 > 2 2021-01-01 00'00"72.4 > 3 2021-01-01 00'00"63.1 > 4 2021-01-01 05'00"67.8 > {code} > Reading with arrow::read_csv_arrow() without providing schema() results in a > tibble with three columns: dttm, chr, dbl. > {code:r} > samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv') > samp_arrow_plain > {code} > {code:r} > # A tibble: 4 x 3 > datetime reading > > 1 2020-12-31 19:00:00 00:00:0067.8 > 2 2020-12-31 19:00:00 00:00:0072.4 > 3 2020-12-31 19:00:00 00:00:0063.1 > 4 2020-12-31 19:00:00 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing date=date32() via schema() > to col_types results in a tibble with three columns: date, chr, dbl. > {code:r} > samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(date=date32())) > samp_arrow_date > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00:00:0067.8 > 2 2021-01-01 00:00:0072.4 > 3 2021-01-01 00:00:0063.1 > 4 2021-01-01 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing time=time32() via schema() > to col_types generates an error. > {code:r} > samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(time=time32())) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > The same error occurs when using compact string notation. > {code:r} > samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', > col_names=c('date', 'time', 'reading'), skip=1) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > This is something in the internals, so far beyond me to figure out a fix, but > I saw it in action and wanted to report it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader
[ https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-11243: --- Labels: pull-request-available (was: ) > [C++] Parse time32 from string and infer in CSV reader > -- > > Key: ARROW-11243 > URL: https://issues.apache.org/jira/browse/ARROW-11243 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 2.0.0 > Environment: Ubuntu 18.04, R 4.0.3 >Reporter: Jared Lander >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 6.0.0 > > Attachments: sampletimedata.csv > > Time Spent: 10m > Remaining Estimate: 0h > > When reading a CSV with read_csv_arrow() with date types and time types, the > dates are read as datetimes rather than dates and times are read as > characters rather than time. > The first problem can be fixed by supplying date32() to schema(), though > better inference would be nice. However, supplying time32() to schema() > causes an error. > Here is a sample dataset, also attached. > date,time,reading > 2021-01-01,00:00:00,67.8 > 2021-01-01,00:00:00,72.4 > 2021-01-01,00:00:00,63.1 > 2021-01-01,00:05:00,67.8 > Reading with readr::read_csv() results in a tibble with three columns: date, > time, dbl, as expected. > > {code:r} > samp_readr <- readr::read_csv('sampledata.csv') > samp_readr > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00'00"67.8 > 2 2021-01-01 00'00"72.4 > 3 2021-01-01 00'00"63.1 > 4 2021-01-01 05'00"67.8 > {code} > Reading with arrow::read_csv_arrow() without providing schema() results in a > tibble with three columns: dttm, chr, dbl. > {code:r} > samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv') > samp_arrow_plain > {code} > {code:r} > # A tibble: 4 x 3 > datetime reading > > 1 2020-12-31 19:00:00 00:00:0067.8 > 2 2020-12-31 19:00:00 00:00:0072.4 > 3 2020-12-31 19:00:00 00:00:0063.1 > 4 2020-12-31 19:00:00 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing date=date32() via schema() > to col_types results in a tibble with three columns: date, chr, dbl. > {code:r} > samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(date=date32())) > samp_arrow_date > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00:00:0067.8 > 2 2021-01-01 00:00:0072.4 > 3 2021-01-01 00:00:0063.1 > 4 2021-01-01 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing time=time32() via schema() > to col_types generates an error. > {code:r} > samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(time=time32())) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > The same error occurs when using compact string notation. > {code:r} > samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', > col_names=c('date', 'time', 'reading'), skip=1) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > This is something in the internals, so far beyond me to figure out a fix, but > I saw it in action and wanted to report it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader
[ https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Molina updated ARROW-11243: -- Fix Version/s: (was: 5.0.0) 6.0.0 > [C++] Parse time32 from string and infer in CSV reader > -- > > Key: ARROW-11243 > URL: https://issues.apache.org/jira/browse/ARROW-11243 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 2.0.0 > Environment: Ubuntu 18.04, R 4.0.3 >Reporter: Jared Lander >Priority: Minor > Fix For: 6.0.0 > > Attachments: sampletimedata.csv > > > When reading a CSV with read_csv_arrow() with date types and time types, the > dates are read as datetimes rather than dates and times are read as > characters rather than time. > The first problem can be fixed by supplying date32() to schema(), though > better inference would be nice. However, supplying time32() to schema() > causes an error. > Here is a sample dataset, also attached. > date,time,reading > 2021-01-01,00:00:00,67.8 > 2021-01-01,00:00:00,72.4 > 2021-01-01,00:00:00,63.1 > 2021-01-01,00:05:00,67.8 > Reading with readr::read_csv() results in a tibble with three columns: date, > time, dbl, as expected. > > {code:r} > samp_readr <- readr::read_csv('sampledata.csv') > samp_readr > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00'00"67.8 > 2 2021-01-01 00'00"72.4 > 3 2021-01-01 00'00"63.1 > 4 2021-01-01 05'00"67.8 > {code} > Reading with arrow::read_csv_arrow() without providing schema() results in a > tibble with three columns: dttm, chr, dbl. > {code:r} > samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv') > samp_arrow_plain > {code} > {code:r} > # A tibble: 4 x 3 > datetime reading > > 1 2020-12-31 19:00:00 00:00:0067.8 > 2 2020-12-31 19:00:00 00:00:0072.4 > 3 2020-12-31 19:00:00 00:00:0063.1 > 4 2020-12-31 19:00:00 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing date=date32() via schema() > to col_types results in a tibble with three columns: date, chr, dbl. > {code:r} > samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(date=date32())) > samp_arrow_date > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00:00:0067.8 > 2 2021-01-01 00:00:0072.4 > 3 2021-01-01 00:00:0063.1 > 4 2021-01-01 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing time=time32() via schema() > to col_types generates an error. > {code:r} > samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(time=time32())) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > The same error occurs when using compact string notation. > {code:r} > samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', > col_names=c('date', 'time', 'reading'), skip=1) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > This is something in the internals, so far beyond me to figure out a fix, but > I saw it in action and wanted to report it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader
[ https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-11243: --- Fix Version/s: (was: 4.0.0) 5.0.0 > [C++] Parse time32 from string and infer in CSV reader > -- > > Key: ARROW-11243 > URL: https://issues.apache.org/jira/browse/ARROW-11243 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 2.0.0 > Environment: Ubuntu 18.04, R 4.0.3 >Reporter: Jared Lander >Priority: Minor > Fix For: 5.0.0 > > Attachments: sampletimedata.csv > > > When reading a CSV with read_csv_arrow() with date types and time types, the > dates are read as datetimes rather than dates and times are read as > characters rather than time. > The first problem can be fixed by supplying date32() to schema(), though > better inference would be nice. However, supplying time32() to schema() > causes an error. > Here is a sample dataset, also attached. > date,time,reading > 2021-01-01,00:00:00,67.8 > 2021-01-01,00:00:00,72.4 > 2021-01-01,00:00:00,63.1 > 2021-01-01,00:05:00,67.8 > Reading with readr::read_csv() results in a tibble with three columns: date, > time, dbl, as expected. > > {code:r} > samp_readr <- readr::read_csv('sampledata.csv') > samp_readr > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00'00"67.8 > 2 2021-01-01 00'00"72.4 > 3 2021-01-01 00'00"63.1 > 4 2021-01-01 05'00"67.8 > {code} > Reading with arrow::read_csv_arrow() without providing schema() results in a > tibble with three columns: dttm, chr, dbl. > {code:r} > samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv') > samp_arrow_plain > {code} > {code:r} > # A tibble: 4 x 3 > datetime reading > > 1 2020-12-31 19:00:00 00:00:0067.8 > 2 2020-12-31 19:00:00 00:00:0072.4 > 3 2020-12-31 19:00:00 00:00:0063.1 > 4 2020-12-31 19:00:00 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing date=date32() via schema() > to col_types results in a tibble with three columns: date, chr, dbl. > {code:r} > samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(date=date32())) > samp_arrow_date > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00:00:0067.8 > 2 2021-01-01 00:00:0072.4 > 3 2021-01-01 00:00:0063.1 > 4 2021-01-01 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing time=time32() via schema() > to col_types generates an error. > {code:r} > samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(time=time32())) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > The same error occurs when using compact string notation. > {code:r} > samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', > col_names=c('date', 'time', 'reading'), skip=1) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > This is something in the internals, so far beyond me to figure out a fix, but > I saw it in action and wanted to report it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader
[ https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-11243: Fix Version/s: 4.0.0 > [C++] Parse time32 from string and infer in CSV reader > -- > > Key: ARROW-11243 > URL: https://issues.apache.org/jira/browse/ARROW-11243 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Affects Versions: 2.0.0 > Environment: Ubuntu 18.04, R 4.0.3 >Reporter: Jared Lander >Priority: Minor > Fix For: 4.0.0 > > Attachments: sampletimedata.csv > > > When reading a CSV with read_csv_arrow() with date types and time types, the > dates are read as datetimes rather than dates and times are read as > characters rather than time. > The first problem can be fixed by supplying date32() to schema(), though > better inference would be nice. However, supplying time32() to schema() > causes an error. > Here is a sample dataset, also attached. > date,time,reading > 2021-01-01,00:00:00,67.8 > 2021-01-01,00:00:00,72.4 > 2021-01-01,00:00:00,63.1 > 2021-01-01,00:05:00,67.8 > Reading with readr::read_csv() results in a tibble with three columns: date, > time, dbl, as expected. > > {code:r} > samp_readr <- readr::read_csv('sampledata.csv') > samp_readr > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00'00"67.8 > 2 2021-01-01 00'00"72.4 > 3 2021-01-01 00'00"63.1 > 4 2021-01-01 05'00"67.8 > {code} > Reading with arrow::read_csv_arrow() without providing schema() results in a > tibble with three columns: dttm, chr, dbl. > {code:r} > samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv') > samp_arrow_plain > {code} > {code:r} > # A tibble: 4 x 3 > datetime reading > > 1 2020-12-31 19:00:00 00:00:0067.8 > 2 2020-12-31 19:00:00 00:00:0072.4 > 3 2020-12-31 19:00:00 00:00:0063.1 > 4 2020-12-31 19:00:00 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing date=date32() via schema() > to col_types results in a tibble with three columns: date, chr, dbl. > {code:r} > samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(date=date32())) > samp_arrow_date > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00:00:0067.8 > 2 2021-01-01 00:00:0072.4 > 3 2021-01-01 00:00:0063.1 > 4 2021-01-01 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing time=time32() via schema() > to col_types generates an error. > {code:r} > samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(time=time32())) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > The same error occurs when using compact string notation. > {code:r} > samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', > col_names=c('date', 'time', 'reading'), skip=1) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > This is something in the internals, so far beyond me to figure out a fix, but > I saw it in action and wanted to report it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader
[ https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-11243: Issue Type: Improvement (was: Bug) > [C++] Parse time32 from string and infer in CSV reader > -- > > Key: ARROW-11243 > URL: https://issues.apache.org/jira/browse/ARROW-11243 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Affects Versions: 2.0.0 > Environment: Ubuntu 18.04, R 4.0.3 >Reporter: Jared Lander >Priority: Minor > Attachments: sampletimedata.csv > > > When reading a CSV with read_csv_arrow() with date types and time types, the > dates are read as datetimes rather than dates and times are read as > characters rather than time. > The first problem can be fixed by supplying date32() to schema(), though > better inference would be nice. However, supplying time32() to schema() > causes an error. > Here is a sample dataset, also attached. > date,time,reading > 2021-01-01,00:00:00,67.8 > 2021-01-01,00:00:00,72.4 > 2021-01-01,00:00:00,63.1 > 2021-01-01,00:05:00,67.8 > Reading with readr::read_csv() results in a tibble with three columns: date, > time, dbl, as expected. > > {code:r} > samp_readr <- readr::read_csv('sampledata.csv') > samp_readr > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00'00"67.8 > 2 2021-01-01 00'00"72.4 > 3 2021-01-01 00'00"63.1 > 4 2021-01-01 05'00"67.8 > {code} > Reading with arrow::read_csv_arrow() without providing schema() results in a > tibble with three columns: dttm, chr, dbl. > {code:r} > samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv') > samp_arrow_plain > {code} > {code:r} > # A tibble: 4 x 3 > datetime reading > > 1 2020-12-31 19:00:00 00:00:0067.8 > 2 2020-12-31 19:00:00 00:00:0072.4 > 3 2020-12-31 19:00:00 00:00:0063.1 > 4 2020-12-31 19:00:00 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing date=date32() via schema() > to col_types results in a tibble with three columns: date, chr, dbl. > {code:r} > samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(date=date32())) > samp_arrow_date > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00:00:0067.8 > 2 2021-01-01 00:00:0072.4 > 3 2021-01-01 00:00:0063.1 > 4 2021-01-01 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing time=time32() via schema() > to col_types generates an error. > {code:r} > samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(time=time32())) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > The same error occurs when using compact string notation. > {code:r} > samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', > col_names=c('date', 'time', 'reading'), skip=1) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > This is something in the internals, so far beyond me to figure out a fix, but > I saw it in action and wanted to report it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader
[ https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-11243: Summary: [C++] Parse time32 from string and infer in CSV reader (was: Cannot use time32() in col_types=schema() when reading CSV with read_csv_arrow()) > [C++] Parse time32 from string and infer in CSV reader > -- > > Key: ARROW-11243 > URL: https://issues.apache.org/jira/browse/ARROW-11243 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 2.0.0 > Environment: Ubuntu 18.04, R 4.0.3 >Reporter: Jared Lander >Priority: Minor > Attachments: sampletimedata.csv > > > When reading a CSV with read_csv_arrow() with date types and time types, the > dates are read as datetimes rather than dates and times are read as > characters rather than time. > The first problem can be fixed by supplying date32() to schema(), though > better inference would be nice. However, supplying time32() to schema() > causes an error. > Here is a sample dataset, also attached. > date,time,reading > 2021-01-01,00:00:00,67.8 > 2021-01-01,00:00:00,72.4 > 2021-01-01,00:00:00,63.1 > 2021-01-01,00:05:00,67.8 > Reading with readr::read_csv() results in a tibble with three columns: date, > time, dbl, as expected. > > {code:r} > samp_readr <- readr::read_csv('sampledata.csv') > samp_readr > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00'00"67.8 > 2 2021-01-01 00'00"72.4 > 3 2021-01-01 00'00"63.1 > 4 2021-01-01 05'00"67.8 > {code} > Reading with arrow::read_csv_arrow() without providing schema() results in a > tibble with three columns: dttm, chr, dbl. > {code:r} > samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv') > samp_arrow_plain > {code} > {code:r} > # A tibble: 4 x 3 > datetime reading > > 1 2020-12-31 19:00:00 00:00:0067.8 > 2 2020-12-31 19:00:00 00:00:0072.4 > 3 2020-12-31 19:00:00 00:00:0063.1 > 4 2020-12-31 19:00:00 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing date=date32() via schema() > to col_types results in a tibble with three columns: date, chr, dbl. > {code:r} > samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(date=date32())) > samp_arrow_date > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00:00:0067.8 > 2 2021-01-01 00:00:0072.4 > 3 2021-01-01 00:00:0063.1 > 4 2021-01-01 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing time=time32() via schema() > to col_types generates an error. > {code:r} > samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(time=time32())) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > The same error occurs when using compact string notation. > {code:r} > samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', > col_names=c('date', 'time', 'reading'), skip=1) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > This is something in the internals, so far beyond me to figure out a fix, but > I saw it in action and wanted to report it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader
[ https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-11243: Component/s: (was: R) C++ > [C++] Parse time32 from string and infer in CSV reader > -- > > Key: ARROW-11243 > URL: https://issues.apache.org/jira/browse/ARROW-11243 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 2.0.0 > Environment: Ubuntu 18.04, R 4.0.3 >Reporter: Jared Lander >Priority: Minor > Fix For: 4.0.0 > > Attachments: sampletimedata.csv > > > When reading a CSV with read_csv_arrow() with date types and time types, the > dates are read as datetimes rather than dates and times are read as > characters rather than time. > The first problem can be fixed by supplying date32() to schema(), though > better inference would be nice. However, supplying time32() to schema() > causes an error. > Here is a sample dataset, also attached. > date,time,reading > 2021-01-01,00:00:00,67.8 > 2021-01-01,00:00:00,72.4 > 2021-01-01,00:00:00,63.1 > 2021-01-01,00:05:00,67.8 > Reading with readr::read_csv() results in a tibble with three columns: date, > time, dbl, as expected. > > {code:r} > samp_readr <- readr::read_csv('sampledata.csv') > samp_readr > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00'00"67.8 > 2 2021-01-01 00'00"72.4 > 3 2021-01-01 00'00"63.1 > 4 2021-01-01 05'00"67.8 > {code} > Reading with arrow::read_csv_arrow() without providing schema() results in a > tibble with three columns: dttm, chr, dbl. > {code:r} > samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv') > samp_arrow_plain > {code} > {code:r} > # A tibble: 4 x 3 > datetime reading > > 1 2020-12-31 19:00:00 00:00:0067.8 > 2 2020-12-31 19:00:00 00:00:0072.4 > 3 2020-12-31 19:00:00 00:00:0063.1 > 4 2020-12-31 19:00:00 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing date=date32() via schema() > to col_types results in a tibble with three columns: date, chr, dbl. > {code:r} > samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(date=date32())) > samp_arrow_date > {code} > {code:r} > # A tibble: 4 x 3 > date time reading > > 1 2021-01-01 00:00:0067.8 > 2 2021-01-01 00:00:0072.4 > 3 2021-01-01 00:00:0063.1 > 4 2021-01-01 00:05:0067.8 > {code} > Reading with arrow::read_csv_arrow() and providing time=time32() via schema() > to col_types generates an error. > {code:r} > samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', > col_types=schema(time=time32())) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > The same error occurs when using compact string notation. > {code:r} > samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', > col_names=c('date', 'time', 'reading'), skip=1) > {code} > {code:r} > Error in csv___TableReader__Read(self) : > NotImplemented: CSV conversion to time32[ms] is not supported > {code} > This is something in the internals, so far beyond me to figure out a fix, but > I saw it in action and wanted to report it. -- This message was sent by Atlassian Jira (v8.3.4#803005)