[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader

2021-07-27 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-11243:
---
Fix Version/s: (was: 5.0.0)
   6.0.0

> [C++] Parse time32 from string and infer in CSV reader
> --
>
> Key: ARROW-11243
> URL: https://issues.apache.org/jira/browse/ARROW-11243
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
> Environment: Ubuntu 18.04, R 4.0.3
>Reporter: Jared Lander
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 6.0.0
>
> Attachments: sampletimedata.csv
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When reading a CSV with read_csv_arrow() with date types and time types, the 
> dates are read as datetimes rather than dates and times are read as 
> characters rather than time.
> The first problem can be fixed by supplying date32() to schema(), though 
> better inference would be nice. However, supplying time32() to schema() 
> causes an error.
> Here is a sample dataset, also attached.
> date,time,reading
>  2021-01-01,00:00:00,67.8
>  2021-01-01,00:00:00,72.4
>  2021-01-01,00:00:00,63.1
>  2021-01-01,00:05:00,67.8
> Reading with readr::read_csv() results in a tibble with three columns: date, 
> time, dbl, as expected.
>  
> {code:r}
> samp_readr <- readr::read_csv('sampledata.csv')
> samp_readr
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time   reading
>   
> 1 2021-01-01 00'00"67.8
> 2 2021-01-01 00'00"72.4
> 3 2021-01-01 00'00"63.1
> 4 2021-01-01 05'00"67.8
> {code}
> Reading with arrow::read_csv_arrow() without providing schema() results in a 
> tibble with three columns: dttm, chr, dbl.
> {code:r}
> samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
> samp_arrow_plain
> {code}
> {code:r}
> # A tibble: 4 x 3
>   datetime reading
>   
> 1 2020-12-31 19:00:00 00:00:0067.8
> 2 2020-12-31 19:00:00 00:00:0072.4
> 3 2020-12-31 19:00:00 00:00:0063.1
> 4 2020-12-31 19:00:00 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing date=date32() via schema() 
> to col_types results in a tibble with three columns: date, chr, dbl.
> {code:r}
> samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(date=date32()))
> samp_arrow_date
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time reading
>  
> 1 2021-01-01 00:00:0067.8
> 2 2021-01-01 00:00:0072.4
> 3 2021-01-01 00:00:0063.1
> 4 2021-01-01 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing time=time32() via schema() 
> to col_types generates an error.
> {code:r}
> samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(time=time32()))
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> The same error occurs when using compact string notation.
> {code:r}
> samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', 
> col_names=c('date', 'time', 'reading'), skip=1)
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> This is something in the internals, so far beyond me to figure out a fix, but 
> I saw it in action and wanted to report it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader

2021-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11243:
---
Labels: pull-request-available  (was: )

> [C++] Parse time32 from string and infer in CSV reader
> --
>
> Key: ARROW-11243
> URL: https://issues.apache.org/jira/browse/ARROW-11243
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
> Environment: Ubuntu 18.04, R 4.0.3
>Reporter: Jared Lander
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 6.0.0
>
> Attachments: sampletimedata.csv
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When reading a CSV with read_csv_arrow() with date types and time types, the 
> dates are read as datetimes rather than dates and times are read as 
> characters rather than time.
> The first problem can be fixed by supplying date32() to schema(), though 
> better inference would be nice. However, supplying time32() to schema() 
> causes an error.
> Here is a sample dataset, also attached.
> date,time,reading
>  2021-01-01,00:00:00,67.8
>  2021-01-01,00:00:00,72.4
>  2021-01-01,00:00:00,63.1
>  2021-01-01,00:05:00,67.8
> Reading with readr::read_csv() results in a tibble with three columns: date, 
> time, dbl, as expected.
>  
> {code:r}
> samp_readr <- readr::read_csv('sampledata.csv')
> samp_readr
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time   reading
>   
> 1 2021-01-01 00'00"67.8
> 2 2021-01-01 00'00"72.4
> 3 2021-01-01 00'00"63.1
> 4 2021-01-01 05'00"67.8
> {code}
> Reading with arrow::read_csv_arrow() without providing schema() results in a 
> tibble with three columns: dttm, chr, dbl.
> {code:r}
> samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
> samp_arrow_plain
> {code}
> {code:r}
> # A tibble: 4 x 3
>   datetime reading
>   
> 1 2020-12-31 19:00:00 00:00:0067.8
> 2 2020-12-31 19:00:00 00:00:0072.4
> 3 2020-12-31 19:00:00 00:00:0063.1
> 4 2020-12-31 19:00:00 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing date=date32() via schema() 
> to col_types results in a tibble with three columns: date, chr, dbl.
> {code:r}
> samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(date=date32()))
> samp_arrow_date
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time reading
>  
> 1 2021-01-01 00:00:0067.8
> 2 2021-01-01 00:00:0072.4
> 3 2021-01-01 00:00:0063.1
> 4 2021-01-01 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing time=time32() via schema() 
> to col_types generates an error.
> {code:r}
> samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(time=time32()))
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> The same error occurs when using compact string notation.
> {code:r}
> samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', 
> col_names=c('date', 'time', 'reading'), skip=1)
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> This is something in the internals, so far beyond me to figure out a fix, but 
> I saw it in action and wanted to report it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader

2021-07-05 Thread Alessandro Molina (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Molina updated ARROW-11243:
--
Fix Version/s: (was: 5.0.0)
   6.0.0

> [C++] Parse time32 from string and infer in CSV reader
> --
>
> Key: ARROW-11243
> URL: https://issues.apache.org/jira/browse/ARROW-11243
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
> Environment: Ubuntu 18.04, R 4.0.3
>Reporter: Jared Lander
>Priority: Minor
> Fix For: 6.0.0
>
> Attachments: sampletimedata.csv
>
>
> When reading a CSV with read_csv_arrow() with date types and time types, the 
> dates are read as datetimes rather than dates and times are read as 
> characters rather than time.
> The first problem can be fixed by supplying date32() to schema(), though 
> better inference would be nice. However, supplying time32() to schema() 
> causes an error.
> Here is a sample dataset, also attached.
> date,time,reading
>  2021-01-01,00:00:00,67.8
>  2021-01-01,00:00:00,72.4
>  2021-01-01,00:00:00,63.1
>  2021-01-01,00:05:00,67.8
> Reading with readr::read_csv() results in a tibble with three columns: date, 
> time, dbl, as expected.
>  
> {code:r}
> samp_readr <- readr::read_csv('sampledata.csv')
> samp_readr
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time   reading
>   
> 1 2021-01-01 00'00"67.8
> 2 2021-01-01 00'00"72.4
> 3 2021-01-01 00'00"63.1
> 4 2021-01-01 05'00"67.8
> {code}
> Reading with arrow::read_csv_arrow() without providing schema() results in a 
> tibble with three columns: dttm, chr, dbl.
> {code:r}
> samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
> samp_arrow_plain
> {code}
> {code:r}
> # A tibble: 4 x 3
>   datetime reading
>   
> 1 2020-12-31 19:00:00 00:00:0067.8
> 2 2020-12-31 19:00:00 00:00:0072.4
> 3 2020-12-31 19:00:00 00:00:0063.1
> 4 2020-12-31 19:00:00 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing date=date32() via schema() 
> to col_types results in a tibble with three columns: date, chr, dbl.
> {code:r}
> samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(date=date32()))
> samp_arrow_date
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time reading
>  
> 1 2021-01-01 00:00:0067.8
> 2 2021-01-01 00:00:0072.4
> 3 2021-01-01 00:00:0063.1
> 4 2021-01-01 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing time=time32() via schema() 
> to col_types generates an error.
> {code:r}
> samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(time=time32()))
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> The same error occurs when using compact string notation.
> {code:r}
> samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', 
> col_names=c('date', 'time', 'reading'), skip=1)
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> This is something in the internals, so far beyond me to figure out a fix, but 
> I saw it in action and wanted to report it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader

2021-04-06 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-11243:
---
Fix Version/s: (was: 4.0.0)
   5.0.0

> [C++] Parse time32 from string and infer in CSV reader
> --
>
> Key: ARROW-11243
> URL: https://issues.apache.org/jira/browse/ARROW-11243
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
> Environment: Ubuntu 18.04, R 4.0.3
>Reporter: Jared Lander
>Priority: Minor
> Fix For: 5.0.0
>
> Attachments: sampletimedata.csv
>
>
> When reading a CSV with read_csv_arrow() with date types and time types, the 
> dates are read as datetimes rather than dates and times are read as 
> characters rather than time.
> The first problem can be fixed by supplying date32() to schema(), though 
> better inference would be nice. However, supplying time32() to schema() 
> causes an error.
> Here is a sample dataset, also attached.
> date,time,reading
>  2021-01-01,00:00:00,67.8
>  2021-01-01,00:00:00,72.4
>  2021-01-01,00:00:00,63.1
>  2021-01-01,00:05:00,67.8
> Reading with readr::read_csv() results in a tibble with three columns: date, 
> time, dbl, as expected.
>  
> {code:r}
> samp_readr <- readr::read_csv('sampledata.csv')
> samp_readr
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time   reading
>   
> 1 2021-01-01 00'00"67.8
> 2 2021-01-01 00'00"72.4
> 3 2021-01-01 00'00"63.1
> 4 2021-01-01 05'00"67.8
> {code}
> Reading with arrow::read_csv_arrow() without providing schema() results in a 
> tibble with three columns: dttm, chr, dbl.
> {code:r}
> samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
> samp_arrow_plain
> {code}
> {code:r}
> # A tibble: 4 x 3
>   datetime reading
>   
> 1 2020-12-31 19:00:00 00:00:0067.8
> 2 2020-12-31 19:00:00 00:00:0072.4
> 3 2020-12-31 19:00:00 00:00:0063.1
> 4 2020-12-31 19:00:00 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing date=date32() via schema() 
> to col_types results in a tibble with three columns: date, chr, dbl.
> {code:r}
> samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(date=date32()))
> samp_arrow_date
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time reading
>  
> 1 2021-01-01 00:00:0067.8
> 2 2021-01-01 00:00:0072.4
> 3 2021-01-01 00:00:0063.1
> 4 2021-01-01 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing time=time32() via schema() 
> to col_types generates an error.
> {code:r}
> samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(time=time32()))
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> The same error occurs when using compact string notation.
> {code:r}
> samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', 
> col_names=c('date', 'time', 'reading'), skip=1)
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> This is something in the internals, so far beyond me to figure out a fix, but 
> I saw it in action and wanted to report it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader

2021-01-13 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-11243:

Fix Version/s: 4.0.0

> [C++] Parse time32 from string and infer in CSV reader
> --
>
> Key: ARROW-11243
> URL: https://issues.apache.org/jira/browse/ARROW-11243
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 2.0.0
> Environment: Ubuntu 18.04, R 4.0.3
>Reporter: Jared Lander
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: sampletimedata.csv
>
>
> When reading a CSV with read_csv_arrow() with date types and time types, the 
> dates are read as datetimes rather than dates and times are read as 
> characters rather than time.
> The first problem can be fixed by supplying date32() to schema(), though 
> better inference would be nice. However, supplying time32() to schema() 
> causes an error.
> Here is a sample dataset, also attached.
> date,time,reading
>  2021-01-01,00:00:00,67.8
>  2021-01-01,00:00:00,72.4
>  2021-01-01,00:00:00,63.1
>  2021-01-01,00:05:00,67.8
> Reading with readr::read_csv() results in a tibble with three columns: date, 
> time, dbl, as expected.
>  
> {code:r}
> samp_readr <- readr::read_csv('sampledata.csv')
> samp_readr
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time   reading
>   
> 1 2021-01-01 00'00"67.8
> 2 2021-01-01 00'00"72.4
> 3 2021-01-01 00'00"63.1
> 4 2021-01-01 05'00"67.8
> {code}
> Reading with arrow::read_csv_arrow() without providing schema() results in a 
> tibble with three columns: dttm, chr, dbl.
> {code:r}
> samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
> samp_arrow_plain
> {code}
> {code:r}
> # A tibble: 4 x 3
>   datetime reading
>   
> 1 2020-12-31 19:00:00 00:00:0067.8
> 2 2020-12-31 19:00:00 00:00:0072.4
> 3 2020-12-31 19:00:00 00:00:0063.1
> 4 2020-12-31 19:00:00 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing date=date32() via schema() 
> to col_types results in a tibble with three columns: date, chr, dbl.
> {code:r}
> samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(date=date32()))
> samp_arrow_date
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time reading
>  
> 1 2021-01-01 00:00:0067.8
> 2 2021-01-01 00:00:0072.4
> 3 2021-01-01 00:00:0063.1
> 4 2021-01-01 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing time=time32() via schema() 
> to col_types generates an error.
> {code:r}
> samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(time=time32()))
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> The same error occurs when using compact string notation.
> {code:r}
> samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', 
> col_names=c('date', 'time', 'reading'), skip=1)
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> This is something in the internals, so far beyond me to figure out a fix, but 
> I saw it in action and wanted to report it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader

2021-01-13 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-11243:

Issue Type: Improvement  (was: Bug)

> [C++] Parse time32 from string and infer in CSV reader
> --
>
> Key: ARROW-11243
> URL: https://issues.apache.org/jira/browse/ARROW-11243
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 2.0.0
> Environment: Ubuntu 18.04, R 4.0.3
>Reporter: Jared Lander
>Priority: Minor
> Attachments: sampletimedata.csv
>
>
> When reading a CSV with read_csv_arrow() with date types and time types, the 
> dates are read as datetimes rather than dates and times are read as 
> characters rather than time.
> The first problem can be fixed by supplying date32() to schema(), though 
> better inference would be nice. However, supplying time32() to schema() 
> causes an error.
> Here is a sample dataset, also attached.
> date,time,reading
>  2021-01-01,00:00:00,67.8
>  2021-01-01,00:00:00,72.4
>  2021-01-01,00:00:00,63.1
>  2021-01-01,00:05:00,67.8
> Reading with readr::read_csv() results in a tibble with three columns: date, 
> time, dbl, as expected.
>  
> {code:r}
> samp_readr <- readr::read_csv('sampledata.csv')
> samp_readr
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time   reading
>   
> 1 2021-01-01 00'00"67.8
> 2 2021-01-01 00'00"72.4
> 3 2021-01-01 00'00"63.1
> 4 2021-01-01 05'00"67.8
> {code}
> Reading with arrow::read_csv_arrow() without providing schema() results in a 
> tibble with three columns: dttm, chr, dbl.
> {code:r}
> samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
> samp_arrow_plain
> {code}
> {code:r}
> # A tibble: 4 x 3
>   datetime reading
>   
> 1 2020-12-31 19:00:00 00:00:0067.8
> 2 2020-12-31 19:00:00 00:00:0072.4
> 3 2020-12-31 19:00:00 00:00:0063.1
> 4 2020-12-31 19:00:00 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing date=date32() via schema() 
> to col_types results in a tibble with three columns: date, chr, dbl.
> {code:r}
> samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(date=date32()))
> samp_arrow_date
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time reading
>  
> 1 2021-01-01 00:00:0067.8
> 2 2021-01-01 00:00:0072.4
> 3 2021-01-01 00:00:0063.1
> 4 2021-01-01 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing time=time32() via schema() 
> to col_types generates an error.
> {code:r}
> samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(time=time32()))
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> The same error occurs when using compact string notation.
> {code:r}
> samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', 
> col_names=c('date', 'time', 'reading'), skip=1)
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> This is something in the internals, so far beyond me to figure out a fix, but 
> I saw it in action and wanted to report it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader

2021-01-13 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-11243:

Summary: [C++] Parse time32 from string and infer in CSV reader  (was: 
Cannot use time32() in col_types=schema() when reading CSV with 
read_csv_arrow())

> [C++] Parse time32 from string and infer in CSV reader
> --
>
> Key: ARROW-11243
> URL: https://issues.apache.org/jira/browse/ARROW-11243
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.0.0
> Environment: Ubuntu 18.04, R 4.0.3
>Reporter: Jared Lander
>Priority: Minor
> Attachments: sampletimedata.csv
>
>
> When reading a CSV with read_csv_arrow() with date types and time types, the 
> dates are read as datetimes rather than dates and times are read as 
> characters rather than time.
> The first problem can be fixed by supplying date32() to schema(), though 
> better inference would be nice. However, supplying time32() to schema() 
> causes an error.
> Here is a sample dataset, also attached.
> date,time,reading
>  2021-01-01,00:00:00,67.8
>  2021-01-01,00:00:00,72.4
>  2021-01-01,00:00:00,63.1
>  2021-01-01,00:05:00,67.8
> Reading with readr::read_csv() results in a tibble with three columns: date, 
> time, dbl, as expected.
>  
> {code:r}
> samp_readr <- readr::read_csv('sampledata.csv')
> samp_readr
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time   reading
>   
> 1 2021-01-01 00'00"67.8
> 2 2021-01-01 00'00"72.4
> 3 2021-01-01 00'00"63.1
> 4 2021-01-01 05'00"67.8
> {code}
> Reading with arrow::read_csv_arrow() without providing schema() results in a 
> tibble with three columns: dttm, chr, dbl.
> {code:r}
> samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
> samp_arrow_plain
> {code}
> {code:r}
> # A tibble: 4 x 3
>   datetime reading
>   
> 1 2020-12-31 19:00:00 00:00:0067.8
> 2 2020-12-31 19:00:00 00:00:0072.4
> 3 2020-12-31 19:00:00 00:00:0063.1
> 4 2020-12-31 19:00:00 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing date=date32() via schema() 
> to col_types results in a tibble with three columns: date, chr, dbl.
> {code:r}
> samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(date=date32()))
> samp_arrow_date
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time reading
>  
> 1 2021-01-01 00:00:0067.8
> 2 2021-01-01 00:00:0072.4
> 3 2021-01-01 00:00:0063.1
> 4 2021-01-01 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing time=time32() via schema() 
> to col_types generates an error.
> {code:r}
> samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(time=time32()))
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> The same error occurs when using compact string notation.
> {code:r}
> samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', 
> col_names=c('date', 'time', 'reading'), skip=1)
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> This is something in the internals, so far beyond me to figure out a fix, but 
> I saw it in action and wanted to report it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader

2021-01-13 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-11243:

Component/s: (was: R)
 C++

> [C++] Parse time32 from string and infer in CSV reader
> --
>
> Key: ARROW-11243
> URL: https://issues.apache.org/jira/browse/ARROW-11243
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 2.0.0
> Environment: Ubuntu 18.04, R 4.0.3
>Reporter: Jared Lander
>Priority: Minor
> Fix For: 4.0.0
>
> Attachments: sampletimedata.csv
>
>
> When reading a CSV with read_csv_arrow() with date types and time types, the 
> dates are read as datetimes rather than dates and times are read as 
> characters rather than time.
> The first problem can be fixed by supplying date32() to schema(), though 
> better inference would be nice. However, supplying time32() to schema() 
> causes an error.
> Here is a sample dataset, also attached.
> date,time,reading
>  2021-01-01,00:00:00,67.8
>  2021-01-01,00:00:00,72.4
>  2021-01-01,00:00:00,63.1
>  2021-01-01,00:05:00,67.8
> Reading with readr::read_csv() results in a tibble with three columns: date, 
> time, dbl, as expected.
>  
> {code:r}
> samp_readr <- readr::read_csv('sampledata.csv')
> samp_readr
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time   reading
>   
> 1 2021-01-01 00'00"67.8
> 2 2021-01-01 00'00"72.4
> 3 2021-01-01 00'00"63.1
> 4 2021-01-01 05'00"67.8
> {code}
> Reading with arrow::read_csv_arrow() without providing schema() results in a 
> tibble with three columns: dttm, chr, dbl.
> {code:r}
> samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
> samp_arrow_plain
> {code}
> {code:r}
> # A tibble: 4 x 3
>   datetime reading
>   
> 1 2020-12-31 19:00:00 00:00:0067.8
> 2 2020-12-31 19:00:00 00:00:0072.4
> 3 2020-12-31 19:00:00 00:00:0063.1
> 4 2020-12-31 19:00:00 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing date=date32() via schema() 
> to col_types results in a tibble with three columns: date, chr, dbl.
> {code:r}
> samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(date=date32()))
> samp_arrow_date
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date   time reading
>  
> 1 2021-01-01 00:00:0067.8
> 2 2021-01-01 00:00:0072.4
> 3 2021-01-01 00:00:0063.1
> 4 2021-01-01 00:05:0067.8
> {code}
> Reading with arrow::read_csv_arrow() and providing time=time32() via schema() 
> to col_types generates an error.
> {code:r}
> samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', 
> col_types=schema(time=time32()))
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> The same error occurs when using compact string notation.
> {code:r}
> samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', 
> col_names=c('date', 'time', 'reading'), skip=1)
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> This is something in the internals, so far beyond me to figure out a fix, but 
> I saw it in action and wanted to report it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)