[jira] [Created] (ARROW-16075) Does arrow support S3 bucket retention period setting

2022-03-30 Thread Sifang Li (Jira)
Sifang Li created ARROW-16075:
-

 Summary: Does arrow support S3 bucket retention period setting
 Key: ARROW-16075
 URL: https://issues.apache.org/jira/browse/ARROW-16075
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Sifang Li


I cannot find any doc mentioning how to set the object lock (retention) period 
when creating a bucket (dir) via arrow's S3 support. Is there a way for doing 
such thing within arrow?

Thanks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15790) [C++] field's metadata is not written to Parquet file

2022-03-01 Thread Sifang Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499692#comment-17499692
 ] 

Sifang Li commented on ARROW-15790:
---

yes - that worked for me - it would be nice if they are stored automatically 
because I cannot imagine it would take up much space or why people would want 
that info dropped in any scenarios.

> [C++] field's metadata is not written to Parquet file
> -
>
> Key: ARROW-15790
> URL: https://issues.apache.org/jira/browse/ARROW-15790
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: Ubuntu
>Reporter: Sifang Li
>Priority: Blocker
>
> I used this code to test the metadata write into file and read back behavior 
> of parquet  file:
> [https://gist.github.com/dantrim/33f9f14d0b2d3ec45c022aa05f7a45ee]
>  
> The generated file does not have metadata when I read the file in using code 
> below and print it out: 
>  
> {quote}std::shared_ptr infile;
> PARQUET_ASSIGN_OR_THROW(infile,
> arrow::io::ReadableFile::Open("./test.parquet", 
> arrow::default_memory_pool()));
> std::unique_ptr reader;
> PARQUET_THROW_NOT_OK(
> parquet::arrow::OpenFile(infile, arrow::default_memory_pool(), &reader));
> std::shared_ptr table;
> PARQUET_THROW_NOT_OK(reader->ReadTable(&table));
> EXPECT_EQ(frameCount, table->num_rows());
> std::cout<<"==="ToString(true) < shown{quote}
> Here is the version info:
> libparquet-dev/focal,now 7.0.0-1 amd64 [installed]
> libparquet-glib-dev/focal,now 7.0.0-1 amd64 [installed]
> libparquet-glib700/focal,now 7.0.0-1 amd64 [installed,automatic]
> libparquet700/focal,now 7.0.0-1 amd64 [installed,automatic]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15790) field's metadata is not write into Parquet file

2022-02-28 Thread Sifang Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499224#comment-17499224
 ] 

Sifang Li commented on ARROW-15790:
---

I can see in writer.cc - the metadata is apparently ignored:

 
Status Init() {
returnSchemaManifest::Make(writer_->schema(), /*schema_metadata=*/nullptr,
default_arrow_reader_properties(), &schema_manifest_);
}

> field's metadata is not write into Parquet file
> ---
>
> Key: ARROW-15790
> URL: https://issues.apache.org/jira/browse/ARROW-15790
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: Ubuntu
>Reporter: Sifang Li
>Priority: Blocker
>
> I used this code to test the metadata write into file and read back behavior 
> of parquet  file:
> [https://gist.github.com/dantrim/33f9f14d0b2d3ec45c022aa05f7a45ee]
>  
> The generated file does not have metadata when I read the file in using code 
> below and print it out: 
>  
> {quote}std::shared_ptr infile;
> PARQUET_ASSIGN_OR_THROW(infile,
> arrow::io::ReadableFile::Open("./test.parquet", 
> arrow::default_memory_pool()));
> std::unique_ptr reader;
> PARQUET_THROW_NOT_OK(
> parquet::arrow::OpenFile(infile, arrow::default_memory_pool(), &reader));
> std::shared_ptr table;
> PARQUET_THROW_NOT_OK(reader->ReadTable(&table));
> EXPECT_EQ(frameCount, table->num_rows());
> std::cout<<"==="ToString(true) < shown{quote}
> Here is the version info:
> libparquet-dev/focal,now 7.0.0-1 amd64 [installed]
> libparquet-glib-dev/focal,now 7.0.0-1 amd64 [installed]
> libparquet-glib700/focal,now 7.0.0-1 amd64 [installed,automatic]
> libparquet700/focal,now 7.0.0-1 amd64 [installed,automatic]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15790) field's metadata is not write into Parquet file

2022-02-25 Thread Sifang Li (Jira)
Sifang Li created ARROW-15790:
-

 Summary: field's metadata is not write into Parquet file
 Key: ARROW-15790
 URL: https://issues.apache.org/jira/browse/ARROW-15790
 Project: Apache Arrow
  Issue Type: Bug
 Environment: Ubuntu
Reporter: Sifang Li


I used this code to test the metadata write into file and read back behavior of 
parquet  file:

[https://gist.github.com/dantrim/33f9f14d0b2d3ec45c022aa05f7a45ee]

 

The generated file does not have metadata when I read the file in using code 
below and print it out: 
 
{quote}std::shared_ptr infile;
PARQUET_ASSIGN_OR_THROW(infile,
arrow::io::ReadableFile::Open("./test.parquet", arrow::default_memory_pool()));

std::unique_ptr reader;
PARQUET_THROW_NOT_OK(
parquet::arrow::OpenFile(infile, arrow::default_memory_pool(), &reader));
std::shared_ptr table;
PARQUET_THROW_NOT_OK(reader->ReadTable(&table));
EXPECT_EQ(frameCount, table->num_rows());
std::cout<<"==="ToString(true) <

[jira] [Closed] (ARROW-15780) missing header file parquet/parquet_version.h

2022-02-24 Thread Sifang Li (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-15780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sifang Li closed ARROW-15780.
-
Resolution: Not A Problem

> missing header file parquet/parquet_version.h
> -
>
> Key: ARROW-15780
> URL: https://issues.apache.org/jira/browse/ARROW-15780
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 7.0.0
> Environment: Ubuntu 20.04
>Reporter: Sifang Li
>Priority: Blocker
>
> I am following instructions of writing a table into parquet file:
> [https://arrow.apache.org/docs/cpp/parquet.html]
> Need to include #include "parquet/arrow/writer.h"
> Apparently one header file is missing in the src - cannot find it anywhere:
> In file included from ../3rd_party/arrow/cpp/src/parquet/arrow/writer.h:24,
> ...
> ../3rd_party/arrow/cpp/src/parquet/properties.h:31:10: fatal error: 
> parquet/parquet_version.h: No such file or directory
>    31 | #include "parquet/parquet_version.h"



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15780) missing header file parquet/parquet_version.h

2022-02-24 Thread Sifang Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497677#comment-17497677
 ] 

Sifang Li commented on ARROW-15780:
---

Thanks - I will close this.

> missing header file parquet/parquet_version.h
> -
>
> Key: ARROW-15780
> URL: https://issues.apache.org/jira/browse/ARROW-15780
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 7.0.0
> Environment: Ubuntu 20.04
>Reporter: Sifang Li
>Priority: Blocker
>
> I am following instructions of writing a table into parquet file:
> [https://arrow.apache.org/docs/cpp/parquet.html]
> Need to include #include "parquet/arrow/writer.h"
> Apparently one header file is missing in the src - cannot find it anywhere:
> In file included from ../3rd_party/arrow/cpp/src/parquet/arrow/writer.h:24,
> ...
> ../3rd_party/arrow/cpp/src/parquet/properties.h:31:10: fatal error: 
> parquet/parquet_version.h: No such file or directory
>    31 | #include "parquet/parquet_version.h"



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15780) missing header file parquet/parquet_version.h

2022-02-24 Thread Sifang Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497657#comment-17497657
 ] 

Sifang Li commented on ARROW-15780:
---

I just ran below: (from the manual config instructions)

$ mkdir build-release
$ cd build-release
$ cmake ..
$ make -j8       # if you have 8 CPU cores, otherwise adjust

> missing header file parquet/parquet_version.h
> -
>
> Key: ARROW-15780
> URL: https://issues.apache.org/jira/browse/ARROW-15780
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 7.0.0
> Environment: Ubuntu 20.04
>Reporter: Sifang Li
>Priority: Blocker
>
> I am following instructions of writing a table into parquet file:
> [https://arrow.apache.org/docs/cpp/parquet.html]
> Need to include #include "parquet/arrow/writer.h"
> Apparently one header file is missing in the src - cannot find it anywhere:
> In file included from ../3rd_party/arrow/cpp/src/parquet/arrow/writer.h:24,
> ...
> ../3rd_party/arrow/cpp/src/parquet/properties.h:31:10: fatal error: 
> parquet/parquet_version.h: No such file or directory
>    31 | #include "parquet/parquet_version.h"



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15780) missing header file parquet/parquet_version.h

2022-02-24 Thread Sifang Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497647#comment-17497647
 ] 

Sifang Li commented on ARROW-15780:
---

It looks like an installation issue - I followed directly to the manual 
instruction at:
[https://github.com/apache/arrow/blob/master/docs/source/developers/cpp/building.rst]

The libs are built fine in the out source dir - but the parquet_vrsion.h is 
missing - see it has a .in file apparently the process did not convert it to .h

My cmake is 3.16.3 - is that why?

> missing header file parquet/parquet_version.h
> -
>
> Key: ARROW-15780
> URL: https://issues.apache.org/jira/browse/ARROW-15780
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 7.0.0
> Environment: Ubuntu 20.04
>Reporter: Sifang Li
>Priority: Blocker
>
> I am following instructions of writing a table into parquet file:
> [https://arrow.apache.org/docs/cpp/parquet.html]
> Need to include #include "parquet/arrow/writer.h"
> Apparently one header file is missing in the src - cannot find it anywhere:
> In file included from ../3rd_party/arrow/cpp/src/parquet/arrow/writer.h:24,
> ...
> ../3rd_party/arrow/cpp/src/parquet/properties.h:31:10: fatal error: 
> parquet/parquet_version.h: No such file or directory
>    31 | #include "parquet/parquet_version.h"



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15780) missing header file parquet/parquet_version.h

2022-02-24 Thread Sifang Li (Jira)
Sifang Li created ARROW-15780:
-

 Summary: missing header file parquet/parquet_version.h
 Key: ARROW-15780
 URL: https://issues.apache.org/jira/browse/ARROW-15780
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 7.0.0
 Environment: Ubuntu 20.04
Reporter: Sifang Li


I am following instructions of writing a table into parquet file:
[https://arrow.apache.org/docs/cpp/parquet.html]

Need to include #include "parquet/arrow/writer.h"

Apparently one header file is missing in the src - cannot find it anywhere:

In file included from ../3rd_party/arrow/cpp/src/parquet/arrow/writer.h:24,

...
../3rd_party/arrow/cpp/src/parquet/properties.h:31:10: fatal error: 
parquet/parquet_version.h: No such file or directory
   31 | #include "parquet/parquet_version.h"



--
This message was sent by Atlassian Jira
(v8.20.1#820001)