Thanks Cheng, Nong.
Data in the matrix is homogenous (cells are booleans), so, I don't expect
to face memory related issues. Is the limitation on the # of columns or
memory issues caused by the # of columns? To me it sounds more like memory
issues.
On Mon, Jan 25, 2016 at 10:16 AM, Cheng Lian
Aside from Nong's comment, I think PARQUET-222, where we discussed a
performance issue of writing wide tables, can be helpful.
Cheng
On 1/23/16 4:53 PM, Nong Li wrote:
I expect this to be difficult. This is roughly 3 orders of magnitude more
than even
a typical wide table use case.
Answers
PARQUET-222 is mostly a memory issue caused by the # of columns. On the
write path, each column comes with write buffers, and they can
accumulate to a large amount. In the case investigated in PARQUET-222,
it took more than 10G to write a single row consists of 26k integer
columns. I.e., this
I expect this to be difficult. This is roughly 3 orders of magnitude more
than even
a typical wide table use case.
Answers inline.
On Thu, Jan 21, 2016 at 2:10 PM, Krishna wrote:
> We are considering using Parquet for storing a matrix that is dense and
> very, very wide