Re: Can the R interface to write_parquet accept strings?

2019-09-09 Thread Wes McKinney
I'm referring to the arrow-devel and parquet-devel packages, which are C++
packages. If you built the R library (using install.package) against
version 0.14.0 and then upgraded arrow-devel to 0.14.1 without rebuilding
the R library, you could have this issue.

I would recommend reinstalling the R package and see if the problem goes
away.

On Mon, Sep 9, 2019, 6:34 PM Daniel Feenberg  wrote:

>
>
>
> On Mon, 9 Sep 2019, Wes McKinney wrote:
>
> > I'm a bit confused by the error message
> >
> > "
> > Error in write_parquet_file(to_arrow(table), file) :
> >   Arrow error: IOError: Metadata contains Thrift LogicalType that is
> >   not recognized.
> > "
> >
> > This error comes from
> >
> >
> https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.cc#L455
> >
> > This function should not be called at all during the execution of
> > "write_parquet_file".
> >
> > Daniel, is it possible you changed the C++ library installed after
> > building the "arrow" R package? The R package must generally be
> > recompiled when the C++ library is upgraded
> >
>
> We are not aware of changing anything in C++. It is just as yum left it.
> We didn't compile the R arrow package at all, just used what yum supplied
> from the distribution. Are you suggesting we compile the R package
> ourselves, that the Scientific Linux distribution packages are
> inconsistent? Note that the default C++ is rather old and it would be
> problem to update it, since so many other packages depend on it. But we
> could update Arrow, I suppose.
>
> Daniel Feenberg
>


Re: Can the R interface to write_parquet accept strings?

2019-09-09 Thread Daniel Feenberg





On Mon, 9 Sep 2019, Wes McKinney wrote:


I'm a bit confused by the error message

"
Error in write_parquet_file(to_arrow(table), file) :
  Arrow error: IOError: Metadata contains Thrift LogicalType that is
  not recognized.
"

This error comes from

https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.cc#L455

This function should not be called at all during the execution of
"write_parquet_file".

Daniel, is it possible you changed the C++ library installed after
building the "arrow" R package? The R package must generally be
recompiled when the C++ library is upgraded



We are not aware of changing anything in C++. It is just as yum left it. 
We didn't compile the R arrow package at all, just used what yum supplied 
from the distribution. Are you suggesting we compile the R package 
ourselves, that the Scientific Linux distribution packages are 
inconsistent? Note that the default C++ is rather old and it would be 
problem to update it, since so many other packages depend on it. But we 
could update Arrow, I suppose.


Daniel Feenberg


Re: Can the R interface to write_parquet accept strings?

2019-09-09 Thread Wes McKinney
I'm a bit confused by the error message

"
 Error in write_parquet_file(to_arrow(table), file) :
   Arrow error: IOError: Metadata contains Thrift LogicalType that is
   not recognized.
"

This error comes from

https://github.com/apache/arrow/blob/master/cpp/src/parquet/types.cc#L455

This function should not be called at all during the execution of
"write_parquet_file".

Daniel, is it possible you changed the C++ library installed after
building the "arrow" R package? The R package must generally be
recompiled when the C++ library is upgraded

On Mon, Sep 9, 2019 at 4:29 PM Daniel Feenberg  wrote:
>
>
>
> On Mon, 9 Sep 2019, Neal Richardson wrote:
>
> > Hi Daniel,
> > This works on my machine:
> >
> >> library(arrow)
> >> write_parquet(data.frame(y = c("a", "b", "c"), stringsAsFactors=FALSE), 
> >> file= "string.parquet")
> >> read_parquet("string.parquet")
> >  y
> > 1 a
> > 2 b
> > 3 c
> >>
> >
> > (The function masking warnings are all from library(tidyverse) and
> > aren't relevant here.)
> >
> > What OS are you on, and how did you install the arrow package? I'm on
> > macOS and installed arrow from CRAN, but if that's not the case for
> > you, then your C++ library may have different capabilities.
>
> Here are the details of our installation:
>
> 1) OS:
> --
> Scientific Linux 7
> uname: Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon Nov 26 12:36:06 CST 2018 
> x86_64 x86_64 x86_64 GNU/Linux
>
> 2) gcc version:
> 
> # gcc --version
> gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
>
>
> 3) arrow and parquet library installation:
> --
> yum install arrow-devel parquet-devel
>
> versions:
> arrow-devel: yum info arrow-devel
> Installed Packages
> Name: arrow-devel
> Arch: x86_64
> Version : 0.14.1
> Release : 1.el7
> Size: 20 M
> Repo: installed
> From repo   : apache-arrow
> Summary : Libraries and header files for Apache Arrow C++
> URL : https://arrow.apache.org/
> License : Apache-2.0
> Description : Libraries and header files for Apache Arrow C++.
>
>
> yum info parquet-devel
> Installed Packages
> Name: parquet-devel
> Arch: x86_64
> Version : 0.14.1
> Release : 1.el7
> Size: 6.4 M
> Repo: installed
> >From repo   : apache-arrow
> Summary : Libraries and header files for Apache Parquet C++
> URL : https://arrow.apache.org/
> License : Apache-2.0
> Description : Libraries and header files for Apache Parquet C++.
>
>
> 4) R arrow installation:
> --
> install.packages("arrow")
>
> and also
>
> install.packages("sparklyr")
>
> Thanks for taking an interest.
>
> Daniel Feenberg
>
>
>


Re: Can the R interface to write_parquet accept strings?

2019-09-09 Thread Daniel Feenberg




On Mon, 9 Sep 2019, Neal Richardson wrote:


Hi Daniel,
This works on my machine:


library(arrow)
write_parquet(data.frame(y = c("a", "b", "c"), stringsAsFactors=FALSE), file= 
"string.parquet")
read_parquet("string.parquet")

 y
1 a
2 b
3 c




(The function masking warnings are all from library(tidyverse) and
aren't relevant here.)

What OS are you on, and how did you install the arrow package? I'm on
macOS and installed arrow from CRAN, but if that's not the case for
you, then your C++ library may have different capabilities.


Here are the details of our installation:

1) OS:
--
Scientific Linux 7
uname: Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Mon Nov 26 12:36:06 CST 2018 
x86_64 x86_64 x86_64 GNU/Linux

2) gcc version:

# gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)


3) arrow and parquet library installation:
--
yum install arrow-devel parquet-devel

versions:
arrow-devel: yum info arrow-devel
Installed Packages
Name: arrow-devel
Arch: x86_64
Version : 0.14.1
Release : 1.el7
Size: 20 M
Repo: installed

From repo   : apache-arrow

Summary : Libraries and header files for Apache Arrow C++
URL : https://arrow.apache.org/
License : Apache-2.0
Description : Libraries and header files for Apache Arrow C++.


yum info parquet-devel
Installed Packages
Name: parquet-devel
Arch: x86_64
Version : 0.14.1
Release : 1.el7
Size: 6.4 M
Repo: installed

From repo   : apache-arrow

Summary : Libraries and header files for Apache Parquet C++
URL : https://arrow.apache.org/
License : Apache-2.0
Description : Libraries and header files for Apache Parquet C++.


4) R arrow installation:
--
install.packages("arrow")

and also

install.packages("sparklyr")

Thanks for taking an interest.

Daniel Feenberg





Re: Can the R interface to write_parquet accept strings?

2019-09-09 Thread Neal Richardson
Hi Daniel,
This works on my machine:

> library(arrow)
> write_parquet(data.frame(y = c("a", "b", "c"), stringsAsFactors=FALSE), file= 
> "string.parquet")
> read_parquet("string.parquet")
  y
1 a
2 b
3 c
>

(The function masking warnings are all from library(tidyverse) and
aren't relevant here.)

What OS are you on, and how did you install the arrow package? I'm on
macOS and installed arrow from CRAN, but if that's not the case for
you, then your C++ library may have different capabilities.

Neal

On Sun, Sep 8, 2019 at 3:41 AM Daniel Feenberg  wrote:
>
> Can the R interface to Arrow Parquet write string data? Take the
> following script:
>
>library(arrow)
>library(tidyverse)
>write_parquet(table = tibble(y = c("a", "b", "c")), file = 
> "string.parquet")
>
> I get the error message:
>
>Error in write_parquet_file(to_arrow(table), file) :
>Arrow error: IOError: Metadata contains Thrift LogicalType that is
>not recognized.
>
> after warnings that stats::filter(), stats::lag() and
> arrow::read_table() are masked, but I assume that isn't the problem.
> This is with R 3.5.1 and arrow_0.14.1.1
>
>
> Daniel Feenberg


Can the R interface to write_parquet accept strings?

2019-09-08 Thread Daniel Feenberg
Can the R interface to Arrow Parquet write string data? Take the
following script:

   library(arrow)
   library(tidyverse)
   write_parquet(table = tibble(y = c("a", "b", "c")), file = "string.parquet")

I get the error message:

   Error in write_parquet_file(to_arrow(table), file) :
   Arrow error: IOError: Metadata contains Thrift LogicalType that is
   not recognized.

after warnings that stats::filter(), stats::lag() and
arrow::read_table() are masked, but I assume that isn't the problem.
This is with R 3.5.1 and arrow_0.14.1.1


Daniel Feenberg