[jira] [Commented] (IMPALA-5990) End-to-end compression of metadata
[ https://issues.apache.org/jira/browse/IMPALA-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506446#comment-16506446 ] Tianyi Wang commented on IMPALA-5990: - Today I learned that a thrift message larger than 4GB can be used with TBufferedTransport and TBinaryProtocol. The limits are at other places: TMemoryBuffer cannot handle a message larger than 4GB, thrift cannot handle a single std::string larger than 4GB, etc. So after IMPALA-5990, we have seen ~6GB compressed catalog and it works just fine. > End-to-end compression of metadata > -- > > Key: IMPALA-5990 > URL: https://issues.apache.org/jira/browse/IMPALA-5990 > Project: IMPALA > Issue Type: Improvement > Components: Catalog, Frontend >Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0 >Reporter: Alexander Behm >Assignee: Tianyi Wang >Priority: Critical > Fix For: Impala 2.12.0 > > > The metadata of large tables can become quite big making it costly to hold in > the statestore and disseminate to coordinator impalads. The metadata can even > get so big that fundamental limits like the JVM 2GB array size and the Thrift > 4GB are hit and lead to downtime. > For reducing the statestore metadata topic size we have an existing > "compact_catalog_topic" flag which LZ4 compresses the metadata payload for > the C++ codepaths catalogd->statestore and statestore->impalad. > Unfortunately, the metadata is not compressed in the same way during the > FE->BE transition on the catalogd and the BE->FE transition on the impalad. > The goal of this change is to enable end-to-end compression for the full path > of metadata dissemination. The existing code paths also need significant > cleanup/streamlining. Ideally, the new code should provide consistent size > limits everywhere. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5990) End-to-end compression of metadata
[ https://issues.apache.org/jira/browse/IMPALA-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467798#comment-16467798 ] Tianyi Wang commented on IMPALA-5990: - [~lv] There isn't a concrete plan yet. > End-to-end compression of metadata > -- > > Key: IMPALA-5990 > URL: https://issues.apache.org/jira/browse/IMPALA-5990 > Project: IMPALA > Issue Type: Improvement > Components: Catalog, Frontend >Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0 >Reporter: Alexander Behm >Assignee: Tianyi Wang >Priority: Critical > Fix For: Impala 2.12.0 > > > The metadata of large tables can become quite big making it costly to hold in > the statestore and disseminate to coordinator impalads. The metadata can even > get so big that fundamental limits like the JVM 2GB array size and the Thrift > 4GB are hit and lead to downtime. > For reducing the statestore metadata topic size we have an existing > "compact_catalog_topic" flag which LZ4 compresses the metadata payload for > the C++ codepaths catalogd->statestore and statestore->impalad. > Unfortunately, the metadata is not compressed in the same way during the > FE->BE transition on the catalogd and the BE->FE transition on the impalad. > The goal of this change is to enable end-to-end compression for the full path > of metadata dissemination. The existing code paths also need significant > cleanup/streamlining. Ideally, the new code should provide consistent size > limits everywhere. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5990) End-to-end compression of metadata
[ https://issues.apache.org/jira/browse/IMPALA-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467660#comment-16467660 ] Alexander Behm commented on IMPALA-5990: [~arodoni_cloudera], this is an improvement of an existing configuration "--compact_catalog_topic", so it is not a new user-facing feature. That said, it would be nice to briefly mention it in the release notes as an improvement to metadata handling. > End-to-end compression of metadata > -- > > Key: IMPALA-5990 > URL: https://issues.apache.org/jira/browse/IMPALA-5990 > Project: IMPALA > Issue Type: Improvement > Components: Catalog, Frontend >Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0 >Reporter: Alexander Behm >Assignee: Tianyi Wang >Priority: Critical > Fix For: Impala 2.12.0 > > > The metadata of large tables can become quite big making it costly to hold in > the statestore and disseminate to coordinator impalads. The metadata can even > get so big that fundamental limits like the JVM 2GB array size and the Thrift > 4GB are hit and lead to downtime. > For reducing the statestore metadata topic size we have an existing > "compact_catalog_topic" flag which LZ4 compresses the metadata payload for > the C++ codepaths catalogd->statestore and statestore->impalad. > Unfortunately, the metadata is not compressed in the same way during the > FE->BE transition on the catalogd and the BE->FE transition on the impalad. > The goal of this change is to enable end-to-end compression for the full path > of metadata dissemination. The existing code paths also need significant > cleanup/streamlining. Ideally, the new code should provide consistent size > limits everywhere. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-5990) End-to-end compression of metadata
[ https://issues.apache.org/jira/browse/IMPALA-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467593#comment-16467593 ] Lars Volker commented on IMPALA-5990: - [~tianyiwang] - Is there a plan to address the 2GB uncompressed catalog size limit? I looked for a Jira but couldn't find one for that specifically. > End-to-end compression of metadata > -- > > Key: IMPALA-5990 > URL: https://issues.apache.org/jira/browse/IMPALA-5990 > Project: IMPALA > Issue Type: Improvement > Components: Catalog, Frontend >Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0 >Reporter: Alexander Behm >Assignee: Tianyi Wang >Priority: Critical > Fix For: Impala 2.12.0 > > > The metadata of large tables can become quite big making it costly to hold in > the statestore and disseminate to coordinator impalads. The metadata can even > get so big that fundamental limits like the JVM 2GB array size and the Thrift > 4GB are hit and lead to downtime. > For reducing the statestore metadata topic size we have an existing > "compact_catalog_topic" flag which LZ4 compresses the metadata payload for > the C++ codepaths catalogd->statestore and statestore->impalad. > Unfortunately, the metadata is not compressed in the same way during the > FE->BE transition on the catalogd and the BE->FE transition on the impalad. > The goal of this change is to enable end-to-end compression for the full path > of metadata dissemination. The existing code paths also need significant > cleanup/streamlining. Ideally, the new code should provide consistent size > limits everywhere. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org