Thanks Cheng, Nong.

Data in the matrix is homogenous (cells are booleans), so, I don't expect
to face memory related issues. Is the limitation on the # of columns or
memory issues caused by the # of columns? To me it sounds more like memory

On Mon, Jan 25, 2016 at 10:16 AM, Cheng Lian <> wrote:

> Aside from Nong's comment, I think PARQUET-222, where we discussed a
> performance issue of writing wide tables, can be helpful.
> Cheng
> On 1/23/16 4:53 PM, Nong Li wrote:
>> I expect this to be difficult. This is roughly 3 orders of magnitude more
>> than even
>> a typical wide table use case.
>> Answers inline.
>> On Thu, Jan 21, 2016 at 2:10 PM, Krishna <> wrote:
>> We are considering using Parquet for storing a matrix that is dense and
>>> very, very wide (can have more than 600K columns).
>> I've following questions:
>>>     - Is there is a limit on # of columns in Parquet file? We expect to
>>>     query [10-100] columns at a time using Spark - what are the
>>> performance
>>>     implications in this scenario?
>>> There is no hard limit but I think you'll probably run into some issues.
>> There will
>> probably be code paths that are not optimized for schemas this big but I
>> expect
>> those to be easier to address. The default configurations will probably
>> not
>> work
>> well (the metadata to data ratio would be bad). You can try configuring
>> very large
>> row groups and see how that goes.
>>     - We want a schema-less solution since the matrix can get wider over a
>>>     period of time
>>>     - Is there a way to generate such wide structured schema-less Parquet
>>>     files using map-reduce (input files are in custom binary format)?
>>> No, Parquet requires a schema. The schema is flexible so you could map
>> your
>> schema
>> to a parquet schema (each column could be binary for example.) Why are you
>> looking to
>> use Parquet for this use case?
>>     - HBase can support millions of columns - anyone with prior experience
>>>     that compares Parquet vs HFile performance for wide structured
>>> tables?
>>     - Does Impala have support for evolving schema?
>> Yes. Different systems have different rules on what is allowed but the
>> case
>> of appending
>> a column to an existing schema should be well supported.
>> Krishna

Reply via email to