Re: Parquet for very wide table

2016-01-25 Thread Krishna
Thanks Cheng, Nong. Data in the matrix is homogenous (cells are booleans), so, I don't expect to face memory related issues. Is the limitation on the # of columns or memory issues caused by the # of columns? To me it sounds more like memory issues. On Mon, Jan 25, 2016 at 10:16 AM, Cheng Lian

[jira] [Commented] (PARQUET-462) Create a new Level class for definition and repetition values

2016-01-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115790#comment-15115790 ] Wes McKinney commented on PARQUET-462: -- Could you explain this in more detail, especially in the

[jira] [Commented] (PARQUET-462) Create a new Level class for definition and repetition values

2016-01-25 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115868#comment-15115868 ] Deepak Majeti commented on PARQUET-462: --- The code duplication will happen during the initialization

[jira] [Updated] (PARQUET-461) Improve ColumnReader API

2016-01-25 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Majeti updated PARQUET-461: -- Description: I would like to add some more extensions to the ColumnReader API. These

Re: Parquet for very wide table

2016-01-25 Thread Cheng Lian
Aside from Nong's comment, I think PARQUET-222, where we discussed a performance issue of writing wide tables, can be helpful. Cheng On 1/23/16 4:53 PM, Nong Li wrote: I expect this to be difficult. This is roughly 3 orders of magnitude more than even a typical wide table use case. Answers

[jira] [Commented] (PARQUET-433) Specialize ColumnReaders based on the column type

2016-01-25 Thread Aliaksei Sandryhaila (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115751#comment-15115751 ] Aliaksei Sandryhaila commented on PARQUET-433: -- Yes, your commit looks very close. Thanks

[jira] [Created] (PARQUET-462) Create a new Level class for definition and repetition values

2016-01-25 Thread Deepak Majeti (JIRA)
Deepak Majeti created PARQUET-462: - Summary: Create a new Level class for definition and repetition values Key: PARQUET-462 URL: https://issues.apache.org/jira/browse/PARQUET-462 Project: Parquet

Re: Parquet-cpp

2016-01-25 Thread Aliaksei Sandryhaila
Hi Ryan, This sounds very reasonable. I do not argue to disregard the standard Apache approach to promoting contributors to committers. I am just pointing out that without the input from current committers it is hard for us to productively contribute to the project. As a consequence, it is

Re: Parquet for very wide table

2016-01-25 Thread Cheng Lian
PARQUET-222 is mostly a memory issue caused by the # of columns. On the write path, each column comes with write buffers, and they can accumulate to a large amount. In the case investigated in PARQUET-222, it took more than 10G to write a single row consists of 26k integer columns. I.e., this

[jira] [Commented] (PARQUET-462) Create a new Level class for definition and repetition values

2016-01-25 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115820#comment-15115820 ] Deepak Majeti commented on PARQUET-462: --- The main idea is to prevent code duplication between

[jira] [Resolved] (PARQUET-238) Unable to Install C++ Driver - reference to 'share_ptr' is ambiguous

2016-01-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved PARQUET-238. -- Resolution: Resolved This is resolved with PARQUET-418 and PARQUET-267. Please let us know if

[jira] [Comment Edited] (PARQUET-238) Unable to Install C++ Driver - reference to 'share_ptr' is ambiguous

2016-01-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116456#comment-15116456 ] Wes McKinney edited comment on PARQUET-238 at 1/26/16 1:20 AM: --- This is

[jira] [Created] (PARQUET-464) Add cmake option and #defines to enable/disable struct packing

2016-01-25 Thread Wes McKinney (JIRA)
Wes McKinney created PARQUET-464: Summary: Add cmake option and #defines to enable/disable struct packing Key: PARQUET-464 URL: https://issues.apache.org/jira/browse/PARQUET-464 Project: Parquet

Re: Parquet-cpp

2016-01-25 Thread Wes McKinney
I am happy to help out with the patch maintenance when there are conflicts. With PARQUET-437 we'll want to write more unit tests which will help make sure we aren't breaking each other's code. On Mon, Jan 25, 2016 at 2:33 PM, Aliaksei Sandryhaila wrote: > Hi Ryan, > > This

[jira] [Commented] (PARQUET-449) Update to latest parquet.thrift

2016-01-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15116337#comment-15116337 ] Wes McKinney commented on PARQUET-449: -- [~nongli] the GitHub PR is still outstanding > Update to

Re: Parquet-cpp

2016-01-25 Thread Ryan Blue
Aliaksei, thanks for being understanding here. I agree with you that it is too difficult. We really want to get the cpp side bootstrapped as soon as possible. Lets go with what you suggested, to have contributors review one another's patches and then ask a committer for a final review once