Hi Antoni,

First of all, let me say that indeed that statement is kind of confusing
and in some cases wrong.

The best way to think of the 2GB Java limit is with respect to the amount
of serialized metadata that the catalog server needs to send at any point
in time. There are operations that require a single table to be serialized,
so in that case the 2GB limit is "applied" at the table level. However,
there are other operations, such as sending a catalog topic update to the
statestore. In that case, the catalog may serialize multiple tables
(depending on whether there was a change in metadata since the last update)
into a single Thrift message. Hence, in this case the limit is applied to
the cumulative size of serialized table metadata. That said, in Impala
v2.12 we're changing the behavior so that the 2GB limit is always applied
at a single table.

Hope that helps.
Dimitris

On Thu, Mar 22, 2018 at 12:04 PM, Antoni Ivanov <[email protected]> wrote:

> Hi,
>
> I was reading https://www.cloudera.com/documentation/enterprise/5-11-
> x/topics/impala_partitioning.html#partition_stats
> And noticed this warning "If this metadata for all tables combined exceeds
> 2 GB, you might experience service downtime.”
>
> But I thought that this limit applies for single table. Having large
> number of partitions per table can be an issue because each metadata
> operation for this table would require the whole metadata be updated (as
> far as I understand Impala doesn’t update partially the metadata for only
> the changed partitions) Also because of Java serialization limitation where
> you cannot serialize it to more than 1G or 2G
> I am guessing this is related to https://issues.apache.org/
> jira/browse/IMPALA-5058 - but that’s only if you do frequent DDL
> operations I suppose.
>
> Am I understanding things so far correctly. So why can there be service
> downtime in this case ?
>
> Thanks,
> Antoni

Reply via email to