Re: Parquet for very wide table

2016-01-25 Thread Krishna
Thanks Cheng, Nong. Data in the matrix is homogenous (cells are booleans), so, I don't expect to face memory related issues. Is the limitation on the # of columns or memory issues caused by the # of columns? To me it sounds more like memory issues. On Mon, Jan 25, 2016 at 10:16 AM, Cheng Lian

Re: Parquet for very wide table

2016-01-25 Thread Cheng Lian
Aside from Nong's comment, I think PARQUET-222, where we discussed a performance issue of writing wide tables, can be helpful. Cheng On 1/23/16 4:53 PM, Nong Li wrote: I expect this to be difficult. This is roughly 3 orders of magnitude more than even a typical wide table use case. Answers

Re: Parquet for very wide table

2016-01-25 Thread Cheng Lian
PARQUET-222 is mostly a memory issue caused by the # of columns. On the write path, each column comes with write buffers, and they can accumulate to a large amount. In the case investigated in PARQUET-222, it took more than 10G to write a single row consists of 26k integer columns. I.e., this

Re: Parquet for very wide table

2016-01-23 Thread Nong Li
I expect this to be difficult. This is roughly 3 orders of magnitude more than even a typical wide table use case. Answers inline. On Thu, Jan 21, 2016 at 2:10 PM, Krishna wrote: > We are considering using Parquet for storing a matrix that is dense and > very, very wide