Re: Can the R interface to write_parquet accept strings?
I'm referring to the arrow-devel and parquet-devel packages, which are C++ packages. If you built the R library (using install.package) against version 0.14.0 and then upgraded arrow-devel to 0.14.1 without rebuilding the R library, you could have this issue. I would recommend reinstalling the R package and see if the problem goes away. On Mon, Sep 9, 2019, 6:34 PM Daniel Feenberg wrote: > > > > On Mon, 9 Sep 2019, Wes McKinney wrote: > > > I'm a bit confused by the error message > > > > " > > Error in write_parquet_file(to_arrow(table), file) : > > Arrow error: IOError: Metadata contains Thrift LogicalType that is > > not recognized. > > " > > > > This error comes from > > > > > https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.cc#L455 > > > > This function should not be called at all during the execution of > > "write_parquet_file". > > > > Daniel, is it possible you changed the C++ library installed after > > building the "arrow" R package? The R package must generally be > > recompiled when the C++ library is upgraded > > > > We are not aware of changing anything in C++. It is just as yum left it. > We didn't compile the R arrow package at all, just used what yum supplied > from the distribution. Are you suggesting we compile the R package > ourselves, that the Scientific Linux distribution packages are > inconsistent? Note that the default C++ is rather old and it would be > problem to update it, since so many other packages depend on it. But we > could update Arrow, I suppose. > > Daniel Feenberg >
Re: Can the R interface to write_parquet accept strings?
On Mon, 9 Sep 2019, Wes McKinney wrote: I'm a bit confused by the error message " Error in write_parquet_file(to_arrow(table), file) : Arrow error: IOError: Metadata contains Thrift LogicalType that is not recognized. " This error comes from https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.cc#L455 This function should not be called at all during the execution of "write_parquet_file". Daniel, is it possible you changed the C++ library installed after building the "arrow" R package? The R package must generally be recompiled when the C++ library is upgraded We are not aware of changing anything in C++. It is just as yum left it. We didn't compile the R arrow package at all, just used what yum supplied from the distribution. Are you suggesting we compile the R package ourselves, that the Scientific Linux distribution packages are inconsistent? Note that the default C++ is rather old and it would be problem to update it, since so many other packages depend on it. But we could update Arrow, I suppose. Daniel Feenberg
Re: Can the R interface to write_parquet accept strings?
I'm a bit confused by the error message " Error in write_parquet_file(to_arrow(table), file) : Arrow error: IOError: Metadata contains Thrift LogicalType that is not recognized. " This error comes from https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.cc#L455 This function should not be called at all during the execution of "write_parquet_file". Daniel, is it possible you changed the C++ library installed after building the "arrow" R package? The R package must generally be recompiled when the C++ library is upgraded On Mon, Sep 9, 2019 at 4:29 PM Daniel Feenberg wrote: > > > > On Mon, 9 Sep 2019, Neal Richardson wrote: > > > Hi Daniel, > > This works on my machine: > > > >> library(arrow) > >> write_parquet(data.frame(y = c("a", "b", "c"), stringsAsFactors=FALSE), > >> file= "string.parquet") > >> read_parquet("string.parquet") > > y > > 1 a > > 2 b > > 3 c > >> > > > > (The function masking warnings are all from library(tidyverse) and > > aren't relevant here.) > > > > What OS are you on, and how did you install the arrow package? I'm on > > macOS and installed arrow from CRAN, but if that's not the case for > > you, then your C++ library may have different capabilities. > > Here are the details of our installation: > > 1) OS: > -- > Scientific Linux 7 > uname: Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon Nov 26 12:36:06 CST 2018 > x86_64 x86_64 x86_64 GNU/Linux > > 2) gcc version: > > # gcc --version > gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36) > > > 3) arrow and parquet library installation: > -- > yum install arrow-devel parquet-devel > > versions: > arrow-devel: yum info arrow-devel > Installed Packages > Name: arrow-devel > Arch: x86_64 > Version : 0.14.1 > Release : 1.el7 > Size: 20 M > Repo: installed > From repo : apache-arrow > Summary : Libraries and header files for Apache Arrow C++ > URL : https://arrow.apache.org/ > License : Apache-2.0 > Description : Libraries and header files for Apache Arrow C++. > > > yum info parquet-devel > Installed Packages > Name: parquet-devel > Arch: x86_64 > Version : 0.14.1 > Release : 1.el7 > Size: 6.4 M > Repo: installed > >From repo : apache-arrow > Summary : Libraries and header files for Apache Parquet C++ > URL : https://arrow.apache.org/ > License : Apache-2.0 > Description : Libraries and header files for Apache Parquet C++. > > > 4) R arrow installation: > -- > install.packages("arrow") > > and also > > install.packages("sparklyr") > > Thanks for taking an interest. > > Daniel Feenberg > > >
Re: Can the R interface to write_parquet accept strings?
On Mon, 9 Sep 2019, Neal Richardson wrote: Hi Daniel, This works on my machine: library(arrow) write_parquet(data.frame(y = c("a", "b", "c"), stringsAsFactors=FALSE), file= "string.parquet") read_parquet("string.parquet") y 1 a 2 b 3 c (The function masking warnings are all from library(tidyverse) and aren't relevant here.) What OS are you on, and how did you install the arrow package? I'm on macOS and installed arrow from CRAN, but if that's not the case for you, then your C++ library may have different capabilities. Here are the details of our installation: 1) OS: -- Scientific Linux 7 uname: Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon Nov 26 12:36:06 CST 2018 x86_64 x86_64 x86_64 GNU/Linux 2) gcc version: # gcc --version gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36) 3) arrow and parquet library installation: -- yum install arrow-devel parquet-devel versions: arrow-devel: yum info arrow-devel Installed Packages Name: arrow-devel Arch: x86_64 Version : 0.14.1 Release : 1.el7 Size: 20 M Repo: installed From repo : apache-arrow Summary : Libraries and header files for Apache Arrow C++ URL : https://arrow.apache.org/ License : Apache-2.0 Description : Libraries and header files for Apache Arrow C++. yum info parquet-devel Installed Packages Name: parquet-devel Arch: x86_64 Version : 0.14.1 Release : 1.el7 Size: 6.4 M Repo: installed From repo : apache-arrow Summary : Libraries and header files for Apache Parquet C++ URL : https://arrow.apache.org/ License : Apache-2.0 Description : Libraries and header files for Apache Parquet C++. 4) R arrow installation: -- install.packages("arrow") and also install.packages("sparklyr") Thanks for taking an interest. Daniel Feenberg
Re: Can the R interface to write_parquet accept strings?
Hi Daniel, This works on my machine: > library(arrow) > write_parquet(data.frame(y = c("a", "b", "c"), stringsAsFactors=FALSE), file= > "string.parquet") > read_parquet("string.parquet") y 1 a 2 b 3 c > (The function masking warnings are all from library(tidyverse) and aren't relevant here.) What OS are you on, and how did you install the arrow package? I'm on macOS and installed arrow from CRAN, but if that's not the case for you, then your C++ library may have different capabilities. Neal On Sun, Sep 8, 2019 at 3:41 AM Daniel Feenberg wrote: > > Can the R interface to Arrow Parquet write string data? Take the > following script: > >library(arrow) >library(tidyverse) >write_parquet(table = tibble(y = c("a", "b", "c")), file = > "string.parquet") > > I get the error message: > >Error in write_parquet_file(to_arrow(table), file) : >Arrow error: IOError: Metadata contains Thrift LogicalType that is >not recognized. > > after warnings that stats::filter(), stats::lag() and > arrow::read_table() are masked, but I assume that isn't the problem. > This is with R 3.5.1 and arrow_0.14.1.1 > > > Daniel Feenberg
Can the R interface to write_parquet accept strings?
Can the R interface to Arrow Parquet write string data? Take the following script: library(arrow) library(tidyverse) write_parquet(table = tibble(y = c("a", "b", "c")), file = "string.parquet") I get the error message: Error in write_parquet_file(to_arrow(table), file) : Arrow error: IOError: Metadata contains Thrift LogicalType that is not recognized. after warnings that stats::filter(), stats::lag() and arrow::read_table() are masked, but I assume that isn't the problem. This is with R 3.5.1 and arrow_0.14.1.1 Daniel Feenberg