[jira] [Assigned] (KUDU-3357) Allow servers to not use the advertised RPC addresses
[ https://issues.apache.org/jira/browse/KUDU-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-3357: - Assignee: Andrew Wong > Allow servers to not use the advertised RPC addresses > - > > Key: KUDU-3357 > URL: https://issues.apache.org/jira/browse/KUDU-3357 > Project: Kudu > Issue Type: Improvement > Components: rpc >Reporter: Andrew Wong >Assignee: Andrew Wong >Priority: Major > > When Kudu servers are deployed within an internal network with internal > hostnames (e.g. in a k8s cluster), and Kudu clients are deployed outside of > this network with a mapping of external traffic to internal ports (e.g. with > a load balancer), it’s unclear how to route the Kudu client to the servers > without having all traffic (including RPCs between servers) use publicly > accessible addresses. > For instance, all servers could be configured with the > --rpc_advertised_addreses configuration. However, since these addresses are > used to register servers with the Master, not only would they be used to > indicate where clients should look for data, but they would also be used to > indicate where replicas should heartbeat to other replicas. This would induce > a great deal of traffic on the load balancer. > We should consider allowing “internal” (i.e. tserver and master) traffic to > bypass advertised addresses and use an alternate address. Or at the very > least, introduce a policy for selecting which advertised address to use > depending on what is available (currently, we always the first in the list). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (KUDU-3061) Balance tablet leaders across TServers
[ https://issues.apache.org/jira/browse/KUDU-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-3061: - Assignee: shenxingwuying > Balance tablet leaders across TServers > -- > > Key: KUDU-3061 > URL: https://issues.apache.org/jira/browse/KUDU-3061 > Project: Kudu > Issue Type: New Feature > Components: api, tablet >Affects Versions: 1.11.1 >Reporter: Adam Voga >Assignee: shenxingwuying >Priority: Major > Labels: performance, roadmap-candidate, scalability > > The number of leader tablets per tablet server can become imbalanced over > time, putting additional pressure on a few nodes. > A CLI tool or an extension to the existing balancer should be added to take > care of this. > Currently the only option is running leader_step_down manually. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (KUDU-3357) Allow servers to not use the advertised RPC addresses
Andrew Wong created KUDU-3357: - Summary: Allow servers to not use the advertised RPC addresses Key: KUDU-3357 URL: https://issues.apache.org/jira/browse/KUDU-3357 Project: Kudu Issue Type: Improvement Components: rpc Reporter: Andrew Wong When Kudu servers are deployed within an internal network with internal hostnames (e.g. in a k8s cluster), and Kudu clients are deployed outside of this network with a mapping of external traffic to internal ports (e.g. with a load balancer), it’s unclear how to route the Kudu client to the servers without having all traffic (including RPCs between servers) use publicly accessible addresses. For instance, all servers could be configured with the --rpc_advertised_addreses configuration. However, since these addresses are used to register servers with the Master, not only would they be used to indicate where clients should look for data, but they would also be used to indicate where replicas should heartbeat to other replicas. This would induce a great deal of traffic on the load balancer. We should consider allowing “internal” (i.e. tserver and master) traffic to bypass advertised addresses and use an alternate address. Or at the very least, introduce a policy for selecting which advertised address to use depending on what is available (currently, we always the first in the list). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-3353) Support setnx semantic on column
[ https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494849#comment-17494849 ] Andrew Wong commented on KUDU-3353: --- Upon reading this first, I thought it sounded similar to INSERT_IGNORE, but letting it digest a bit, it seems a bit different since it deals with individual cells of an update row, rather than the entire row. The tricky thing here, I think, is that we want to evaluate the value of an updated column before determining whether to apply the update. This is not something Kudu currently supports – we currently only check primary key presence before applying the row. And note that determining the old value may entail opening up several delta files. While note untenable (e.g., we still open delta files for the presence check to determine if a row was deleted), that is something that would need to be implemented as a part of this operation. Another thought: would it make sense to introduce this as a new write op entirely, some SETNX (similar to INSERT_IGNORE), rather than a part of the schema? > Support setnx semantic on column > > > Key: KUDU-3353 > URL: https://issues.apache.org/jira/browse/KUDU-3353 > Project: Kudu > Issue Type: New Feature > Components: api, server >Reporter: Yingchun Lai >Priority: Major > > h1. motivation > In some usage scenarios, Kudu table has a column with semantic of "create > time", which means it represent the create timestamp of the row. The other > columns have the similar semantic as before, for example, the user properties > like age, address, and etc. > Upstream and Kudu user doesn't know whether a row is exist or not, and every > cell data is the lastest ingested from, for example, event stream. > If without the "create time" column, Kudu user can use UPSERT operations to > write data to the table, every columns with data will overwrite the old data. > But if with the "create time" column, the cell data will be overwrote by the > following UPSERT ops, which is not what we expect. > To achive the goal, we have to read the column out to judge whether the > column is NULL or not, if it's NULL, we can fill the row with the cell, if > not NULL, we will drop it from the data before UPSERT, to avoid overwite > "create time". > It's expensive, is there a way to avoid a read from Kudu? > h1. Resolvation > We can implement column schema with semantic of "update if null". That means > cell data in changelist will update the base data if the latter is NULL, and > will ignore updates if it is not NULL. > So we can use Kudu similarly as before, but only defined the column as > "update if null" when create table or add column. > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports
[ https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451489#comment-17451489 ] Andrew Wong commented on KUDU-3326: --- That just about summarizes everything. Thanks for the recap! > Add Soft Delete Table Supports > -- > > Key: KUDU-3326 > URL: https://issues.apache.org/jira/browse/KUDU-3326 > Project: Kudu > Issue Type: New Feature > Components: api, CLI, client, master, test >Reporter: dengke >Assignee: dengke >Priority: Major > > h2. Brief description: > Soft delete means that the kudu system will not delete the table > immediately after receiving the command to delete the table. Instead, it will > mark the table and set a validity period. After the validity period, will try > again to determine whether the table really needs to be deleted. > This feature can restore data conveniently and timely in the case of > accidental deletion. > h2. Relevant modification points: > 1. After deleting a table, the original table name will be renamed as > KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash > table. > 2. The contents of the trash table are exactly the same as those of the > original table. Although it cannot be renamed, added or deleted directly, > it can be read and written normally. The trash table will be retained for a > period of time by default (such as 7 days, which can be modified through > parameters). The compact priority of the trash table will be set to the > lowest to save the system resources. > 3. The master needs to add a thread to process expired trash tables and > perform real deletion. > 4. It is allowed to create a table with the same name as the original table, > and the newly created table with the same name can be deleted normally. > 5. It is allowed to recall deleted tables, but the following two situations > cannot be recalled: the same original table name exists and the trash table > has expired. > 6. The KUDU_TRASHED is a reserved string for the system. Users are not > allowed to create a table with table names starting with KUDU_TRASHED. > 7. Kudu tool adaptation soft deletion. > 8. Java API adaptation soft deletion. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports
[ https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450798#comment-17450798 ] Andrew Wong commented on KUDU-3326: --- Alternatively, to avoid the whole question of naming convention, rather than relying on table renames (which incurs some IO on the tablet servers to persist metadata), we could introduce a separate list of trashed tables to the catalog manager that isn't visible to users via normal {{ListTable}} and {{OpenTable}} calls. When loading all the tables to memory, based on whether the table has a "trashed_time" field, Kudu could move the table into a separate container (i.e. not {{table_ids_map_}} or {{normalized_table_names_map_}}). When recalling, we could have users recall by specifying a table ID, and potentially giving a new name. > Add Soft Delete Table Supports > -- > > Key: KUDU-3326 > URL: https://issues.apache.org/jira/browse/KUDU-3326 > Project: Kudu > Issue Type: New Feature > Components: api, CLI, client, master, test >Reporter: dengke >Assignee: dengke >Priority: Major > > h2. Brief description: > Soft delete means that the kudu system will not delete the table > immediately after receiving the command to delete the table. Instead, it will > mark the table and set a validity period. After the validity period, will try > again to determine whether the table really needs to be deleted. > This feature can restore data conveniently and timely in the case of > accidental deletion. > h2. Relevant modification points: > 1. After deleting a table, the original table name will be renamed as > KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash > table. > 2. The contents of the trash table are exactly the same as those of the > original table. Although it cannot be renamed, added or deleted directly, > it can be read and written normally. The trash table will be retained for a > period of time by default (such as 7 days, which can be modified through > parameters). The compact priority of the trash table will be set to the > lowest to save the system resources. > 3. The master needs to add a thread to process expired trash tables and > perform real deletion. > 4. It is allowed to create a table with the same name as the original table, > and the newly created table with the same name can be deleted normally. > 5. It is allowed to recall deleted tables, but the following two situations > cannot be recalled: the same original table name exists and the trash table > has expired. > 6. The KUDU_TRASHED is a reserved string for the system. Users are not > allowed to create a table with table names starting with KUDU_TRASHED. > 7. Kudu tool adaptation soft deletion. > 8. Java API adaptation soft deletion. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports
[ https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450792#comment-17450792 ] Andrew Wong commented on KUDU-3326: --- Sorry for the late response here! {quote}So in your opinion, only one trash table is allowed to exist to meet our design requirements? {quote} I wouldn't be against it, at least with the caveats mentioned. That said, while we're thinking about design here, I do think wouldn't be too difficult to come up with a naming convention that does satisfy uniqueness constraints. For instance, we could add the creation timestamp to the trashed table's name, or better yet, the table ID. E.g. instead of KUDU_TRASH:A, we could name it {{KUDU_TRASH::A}} or {{{}KUDU_TRASH::A{}}}. {quote} This function can be distinguished by adding commands parameters, or it is more convenient to mark the trash table directly during list? {quote} I think adding an argument to the {{ListTables()}} API (or adding a new API with the argument) that opts into showing trashed tables seems reasonable. I think the default should be to not show them though; especially as they will not be visible to Impala. > Add Soft Delete Table Supports > -- > > Key: KUDU-3326 > URL: https://issues.apache.org/jira/browse/KUDU-3326 > Project: Kudu > Issue Type: New Feature > Components: api, CLI, client, master, test >Reporter: dengke >Assignee: dengke >Priority: Major > > h2. Brief description: > Soft delete means that the kudu system will not delete the table > immediately after receiving the command to delete the table. Instead, it will > mark the table and set a validity period. After the validity period, will try > again to determine whether the table really needs to be deleted. > This feature can restore data conveniently and timely in the case of > accidental deletion. > h2. Relevant modification points: > 1. After deleting a table, the original table name will be renamed as > KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash > table. > 2. The contents of the trash table are exactly the same as those of the > original table. Although it cannot be renamed, added or deleted directly, > it can be read and written normally. The trash table will be retained for a > period of time by default (such as 7 days, which can be modified through > parameters). The compact priority of the trash table will be set to the > lowest to save the system resources. > 3. The master needs to add a thread to process expired trash tables and > perform real deletion. > 4. It is allowed to create a table with the same name as the original table, > and the newly created table with the same name can be deleted normally. > 5. It is allowed to recall deleted tables, but the following two situations > cannot be recalled: the same original table name exists and the trash table > has expired. > 6. The KUDU_TRASHED is a reserved string for the system. Users are not > allowed to create a table with table names starting with KUDU_TRASHED. > 7. Kudu tool adaptation soft deletion. > 8. Java API adaptation soft deletion. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (KUDU-38) bootstrap should not replay logs that are known to be fully flushed
[ https://issues.apache.org/jira/browse/KUDU-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-38: --- Assignee: Andrew Wong (was: Todd Lipcon) > bootstrap should not replay logs that are known to be fully flushed > --- > > Key: KUDU-38 > URL: https://issues.apache.org/jira/browse/KUDU-38 > Project: Kudu > Issue Type: Sub-task > Components: tablet >Affects Versions: M3 >Reporter: Todd Lipcon >Assignee: Andrew Wong >Priority: Major > Labels: data-scalability, roadmap-candidate, startup-time > > Currently the bootstrap process will process all of the log segments, > including those that can be trivially determined to contain only durable > edits. This makes startup unnecessarily slow. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports
[ https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442039#comment-17442039 ] Andrew Wong commented on KUDU-3326: --- {quote}Can we add a parameter to decide whether to eliminate (redundant) trash tables when deleting tables? If so, your elimination method will be very appropriate. Otherwise, the deletion will fails.{quote} I guess this is reasonable, to err on the side of less aggressive deletion. So if we trash table A, then create a new table A, and then trash A before KUDU_TRASH:A is deleted, the user would be met with an error. {quote}we can add a parameter to control the number of retained trash tables{quote} We can, but how would the naming convention for the multiple trashed tables work? And if we have multiple trashed tables of the same name, should we be able to recall any trashed version of a table? {quote}When HMS-synchronization is enabled,considering that HMS only manages metadata and has no impact on the timing data of kudu, it can be deleted directly from HMS. Rebuild it on the HMS during recall.{quote} Sounds reasonable. I guess we can iron out the HMS features separately, after the initial patch is merged. We'll also need to make sure that there's no confusion when listing tables. E.g. when listing, we shouldn't show any trashed tables unless the user explicitly asks to include trashed tables in the result. > Add Soft Delete Table Supports > -- > > Key: KUDU-3326 > URL: https://issues.apache.org/jira/browse/KUDU-3326 > Project: Kudu > Issue Type: New Feature > Components: api, CLI, client, master, test >Reporter: dengke >Assignee: dengke >Priority: Major > > h2. Brief description: > Soft delete means that the kudu system will not delete the table > immediately after receiving the command to delete the table. Instead, it will > mark the table and set a validity period. After the validity period, will try > again to determine whether the table really needs to be deleted. > This feature can restore data conveniently and timely in the case of > accidental deletion. > h2. Relevant modification points: > 1. After deleting a table, the original table name will be renamed as > KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash > table. > 2. The contents of the trash table are exactly the same as those of the > original table. Although it cannot be renamed, added or deleted directly, > it can be read and written normally. The trash table will be retained for a > period of time by default (such as 7 days, which can be modified through > parameters). The compact priority of the trash table will be set to the > lowest to save the system resources. > 3. The master needs to add a thread to process expired trash tables and > perform real deletion. > 4. It is allowed to create a table with the same name as the original table, > and the newly created table with the same name can be deleted normally. > 5. It is allowed to recall deleted tables, but the following two situations > cannot be recalled: the same original table name exists and the trash table > has expired. > 6. The KUDU_TRASHED is a reserved string for the system. Users are not > allowed to create a table with table names starting with KUDU_TRASHED. > 7. Kudu tool adaptation soft deletion. > 8. Java API adaptation soft deletion. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (KUDU-2681) Account for already-running tasks before sending new ones upon processing tablet reports
[ https://issues.apache.org/jira/browse/KUDU-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-2681. --- Fix Version/s: 1.12.0 Resolution: Duplicate This seems like a duplicate of KUDU-2992 > Account for already-running tasks before sending new ones upon processing > tablet reports > > > Key: KUDU-2681 > URL: https://issues.apache.org/jira/browse/KUDU-2681 > Project: Kudu > Issue Type: Bug > Components: master >Affects Versions: 1.7.1 >Reporter: Andrew Wong >Priority: Major > Labels: scalability, supportability > Fix For: 1.12.0 > > > I've seen a case where the master will reschedule the same delete tablet task > for a given tablet multiple times, e.g. because it received a new tablet > report that the tablet still exists on a given tserver. This results in > significant log-spam, and ends up sending excessive RPCs to the tablet > servers. Here are some master logs demonstrating this (note the repeated > attempt numbers): > > {{I0129 05:09:43.918886 22190 catalog_manager.cc:2922] Sending > DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet > 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 > (server:7050) (Replica with old config index 3048677 (current committed > config index is 3054594))}} > {{W0129 05:09:43.919509 22190 catalog_manager.cc:2892] TS > 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet > 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already > present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already > in progress: opening tablet}} > {{I0129 05:09:43.919517 22190 catalog_manager.cc:2700] Scheduling retry of > 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for > TS=90369522338b4763ae25dd0161d6e548 with a delay of 8226 ms (attempt = 10)}} > {{I0129 05:09:43.960479 22190 catalog_manager.cc:2922] Sending > DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet > 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 > (server:7050) (Replica with old config index 3048677 (current committed > config index is 3054594))}} > {{W0129 05:09:43.961150 22190 catalog_manager.cc:2892] TS > 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet > 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already > present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already > in progress: opening tablet}} > {{I0129 05:09:43.961158 22190 catalog_manager.cc:2700] Scheduling retry of > 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for > TS=90369522338b4763ae25dd0161d6e548 with a delay of 8235 ms (attempt = 10)}} > {{I0129 05:09:44.016152 22190 catalog_manager.cc:2922] Sending > DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet > 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 > (server:7050) (Replica with old config index 3048677 (current committed > config index is 3054594))}} > {{W0129 05:09:44.016383 22190 catalog_manager.cc:2892] TS > 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet > 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already > present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already > in progress: opening tablet}} > {{I0129 05:09:44.016391 22190 catalog_manager.cc:2700] Scheduling retry of > 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for > TS=90369522338b4763ae25dd0161d6e548 with a delay of 8206 ms (attempt = 10)}} > {{I0129 05:09:44.226428 22190 catalog_manager.cc:2922] Sending > DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet > 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 > (server:7050) (Replica with old config index 3048677 (current committed > config index is 3054594))}} > {{W0129 05:09:44.226753 22190 catalog_manager.cc:2892] TS > 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet > 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already > present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already > in progress: opening tablet}} > {{I0129 05:09:44.226773 22190 catalog_manager.cc:2700] Scheduling retry of > 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for > TS=90369522338b4763ae25dd0161d6e548 with a delay of 8207 ms (attempt = 10)}} > {{I0129 05:09:44.234709 22190 catalog_manager.cc:2922] Sending > DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet > 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 > (server:7050) (Replica with old config index 3048677 (current committed > config index is 3054594))}} > {{W0129 05:09:44.234923 22190 catalog_manager.cc:2892] TS > 90369522338b4763ae25dd0161d6e548 (server
[jira] [Assigned] (KUDU-3326) Add Soft Delete Table Supports
[ https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-3326: - Assignee: dengke > Add Soft Delete Table Supports > -- > > Key: KUDU-3326 > URL: https://issues.apache.org/jira/browse/KUDU-3326 > Project: Kudu > Issue Type: New Feature > Components: api, CLI, client, master, test >Reporter: dengke >Assignee: dengke >Priority: Major > > h2. Brief description: > Soft delete means that the kudu system will not delete the table > immediately after receiving the command to delete the table. Instead, it will > mark the table and set a validity period. After the validity period, will try > again to determine whether the table really needs to be deleted. > This feature can restore data conveniently and timely in the case of > accidental deletion. > h2. Relevant modification points: > 1. After deleting a table, the original table name will be renamed as > KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash > table. > 2. The contents of the trash table are exactly the same as those of the > original table. Although it cannot be renamed, added or deleted directly, > it can be read and written normally. The trash table will be retained for a > period of time by default (such as 7 days, which can be modified through > parameters). The compact priority of the trash table will be set to the > lowest to save the system resources. > 3. The master needs to add a thread to process expired trash tables and > perform real deletion. > 4. It is allowed to create a table with the same name as the original table, > and the newly created table with the same name can be deleted normally. > 5. It is allowed to recall deleted tables, but the following two situations > cannot be recalled: the same original table name exists and the trash table > has expired. > 6. The KUDU_TRASHED is a reserved string for the system. Users are not > allowed to create a table with table names starting with KUDU_TRASHED. > 7. Kudu tool adaptation soft deletion. > 8. Java API adaptation soft deletion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports
[ https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428466#comment-17428466 ] Andrew Wong commented on KUDU-3326: --- Thanks for the contribution so far! I do have some questions regarding design, as well as some thoughts on how I am thinking about them. *How does this work when a table is trashed and recreated? And then trashed again before the reservation is complete?* In this case, it may be worth synchronously deleting the table (and its trashed data) and creating the table with the same non-trash name. That way, we only ever have one trashed table at a time, and the first scenario in #5 is addressed. This runs the risk of losing data if, e.g. the create fails for whatever reason. But I think that is a reasonable behavior, since the user has expressed the desire to forget about the old table by creating a new table for the same name. *How does this work when HMS-synchronization is enabled? The HMS doesn't allow for the ":" character. Are trashed tables deleted from the HMS immediately? Or only when fully deleted?* When deleting a table, we should still propagate the deletion to the HMS (rather than a table rename) immediately. Upon recalling the table, we should recreate the table in the HMS. When performing the {{hms check}} and {{hms fix}} tools, we should probably ignore trashed tables. For that matter, for other tools and APIs (list, open table, etc), we may want to ignore trashed tables as well unless explicitly requested, and ensure the only thing a user can do with a trashed table is recall it. > Add Soft Delete Table Supports > -- > > Key: KUDU-3326 > URL: https://issues.apache.org/jira/browse/KUDU-3326 > Project: Kudu > Issue Type: New Feature > Components: api, CLI, client, master, test >Reporter: dengke >Priority: Major > > h2. Brief description: > Soft delete means that the kudu system will not delete the table > immediately after receiving the command to delete the table. Instead, it will > mark the table and set a validity period. After the validity period, will try > again to determine whether the table really needs to be deleted. > This feature can restore data conveniently and timely in the case of > accidental deletion. > h2. Relevant modification points: > 1. After deleting a table, the original table name will be renamed as > KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash > table. > 2. The contents of the trash table are exactly the same as those of the > original table. Although it cannot be renamed, added or deleted directly, > it can be read and written normally. The trash table will be retained for a > period of time by default (such as 7 days, which can be modified through > parameters). The compact priority of the trash table will be set to the > lowest to save the system resources. > 3. The master needs to add a thread to process expired trash tables and > perform real deletion. > 4. It is allowed to create a table with the same name as the original table, > and the newly created table with the same name can be deleted normally. > 5. It is allowed to recall deleted tables, but the following two situations > cannot be recalled: the same original table name exists and the trash table > has expired. > 6. The KUDU_TRASHED is a reserved string for the system. Users are not > allowed to create a table with table names starting with KUDU_TRASHED. > 7. Kudu tool adaptation soft deletion. > 8. Java API adaptation soft deletion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-1620) Consensus peer proxy hostnames should be reresolved on failure
[ https://issues.apache.org/jira/browse/KUDU-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-1620. --- Fix Version/s: 1.16.0 Assignee: Andrew Wong Resolution: Fixed > Consensus peer proxy hostnames should be reresolved on failure > -- > > Key: KUDU-1620 > URL: https://issues.apache.org/jira/browse/KUDU-1620 > Project: Kudu > Issue Type: Bug > Components: consensus >Affects Versions: 1.0.0 >Reporter: Adar Dembo >Assignee: Andrew Wong >Priority: Major > Labels: docker > Fix For: 1.16.0 > > > Noticed this while documenting the workflow to replace a dead master, which > currently bypasses Raft config changes in favor of having the replacement > master "masquerade" as the dead master via DNS changes. > Internally we never rebuild consensus peer proxies in the event of network > failure; we assume that the peer will return at the same location. Nominally > this is reasonable; allowing peers to change host/port information on the fly > is tricky and has yet to be implemented. But, we should at least retry the > DNS resolution; not doing so forces the workflow to include steps to restart > the existing masters, which creates a (small) availability outage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-75) Allow RPC proxies to take HostPort and do DNS resolution inline with calls
[ https://issues.apache.org/jira/browse/KUDU-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-75. - Fix Version/s: 1.16.0 Assignee: Andrew Wong Resolution: Fixed > Allow RPC proxies to take HostPort and do DNS resolution inline with calls > -- > > Key: KUDU-75 > URL: https://issues.apache.org/jira/browse/KUDU-75 > Project: Kudu > Issue Type: Improvement > Components: rpc >Affects Versions: M4 >Reporter: Todd Lipcon >Assignee: Andrew Wong >Priority: Major > Fix For: 1.16.0 > > > A lot of RPC calls will be done against host/ports rather than ip/ports. We > should make the Proxy itself do the resolution inline in the async path (and > perhaps have some method to refresh DNS) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-1885) Master caches DNS name resolution forever
[ https://issues.apache.org/jira/browse/KUDU-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-1885. --- Fix Version/s: 1.16.0 Assignee: Andrew Wong Resolution: Fixed > Master caches DNS name resolution forever > - > > Key: KUDU-1885 > URL: https://issues.apache.org/jira/browse/KUDU-1885 > Project: Kudu > Issue Type: Bug > Components: master >Affects Versions: 1.3.0 >Reporter: Adar Dembo >Assignee: Andrew Wong >Priority: Major > Fix For: 1.16.0 > > > TSDescriptor::GetTSAdminProxy() and TSDescriptor::GetConsensusProxy() will > return the same proxy instances over and over. Normally, this is a reasonable > optimization. But suppose the IP address of the tserver changes (due to a > DHCP lease expiring or some such). Now these methods will be returning > unusable proxies, and there's no way to "reset" them. > Admittedly this scenario is a little contrived: if a tserver's IP address > suddenly changes, a bunch of other stuff will break too. The tserver will > probably need to be restarted (since it's bound to a socket whose address no > longer exists), and consensus may be thoroughly wrecked due to built-in > host/port assumptions (see KUDU-418). > An issue like this was reported by a user in Slack, who was running a master > and tserver on the same box. The symptom was "half-open" communication > between them: the tserver could heartbeat to the master, but the master could > not send RPCs to the tserver. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3300) Include the full path of the container in the error message
[ https://issues.apache.org/jira/browse/KUDU-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3300: -- Labels: newbie (was: ) > Include the full path of the container in the error message > --- > > Key: KUDU-3300 > URL: https://issues.apache.org/jira/browse/KUDU-3300 > Project: Kudu > Issue Type: Improvement > Components: cfile >Reporter: Abhishek >Priority: Minor > Labels: newbie > > If there are multiple data directories configured, having the linux path to > the full container file will help to locate the file without having to search > for the file > Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: Could > not open container 26f5cbd97dfe4cb98f49bb0a6a494e8f: Invalid magic number: > Expected: kuducntr, found: \000\000\020\001\030▒▒▒ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3311) Allow masters to start up with a list of masters with a diff of one from what's on disk
[ https://issues.apache.org/jira/browse/KUDU-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3311: -- Labels: newbie (was: ) > Allow masters to start up with a list of masters with a diff of one from > what's on disk > --- > > Key: KUDU-3311 > URL: https://issues.apache.org/jira/browse/KUDU-3311 > Project: Kudu > Issue Type: Improvement > Components: master >Reporter: Andrew Wong >Priority: Major > Labels: newbie > > Now that Kudu automatically adds a master if we start up a new master > alongside an existing only set of masters, we should also loosen the > restriction that the gflag is the same as the existing Raft config, in case > users want to add a master and then restart the entire cluster at the same > time. > This seems like it would be common enough for orchestration tools like CM, > which marks the cluster as stale and suggests a full service restart upon > adding a new master role. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3325) When wal is deleted, fault recovery and load balancing are abnormal
[ https://issues.apache.org/jira/browse/KUDU-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427361#comment-17427361 ] Andrew Wong commented on KUDU-3325: --- I'm curious -- why was the WAL deleted in the first place? In general, Kudu never expects that files are deleted from out underneath it. Was this caused by some power failure? Some disk loss? I think the best route forward would be to treat the tablet as failed, and re-replicate from another replica if available. > When wal is deleted, fault recovery and load balancing are abnormal > --- > > Key: KUDU-3325 > URL: https://issues.apache.org/jira/browse/KUDU-3325 > Project: Kudu > Issue Type: Bug > Components: consensus >Reporter: yejiabao_h >Priority: Major > Attachments: image-2021-10-06-15-36-40-996.png, > image-2021-10-06-15-36-53-813.png, image-2021-10-06-15-37-09-520.png, > image-2021-10-06-15-37-24-776.png, image-2021-10-06-15-37-42-533.png, > image-2021-10-06-15-37-54-782.png, image-2021-10-06-15-38-06-575.png, > image-2021-10-06-15-38-17-388.png, image-2021-10-06-15-38-29-176.png, > image-2021-10-06-15-38-39-852.png, image-2021-10-06-15-38-53-343.png, > image-2021-10-06-15-39-03-296.png, image-2021-10-06-19-23-51-769.png > > > h3. 1、using kudu leader step down to create multiple wal message > ./kudu tablet leader_step_down $MASTER_IP 1299f5a939d2453c83104a6db0cae3e7 > h4. wal > !image-2021-10-06-15-36-40-996.png! > h4. cmeta > !image-2021-10-06-15-36-53-813.png! > h3. 2、stop one of tserver to start tablet recovery,so that we can make > opid_index flush to cmeta > !image-2021-10-06-15-37-09-520.png! > h4. wal > !image-2021-10-06-15-37-24-776.png! > h4. cmeta > !image-2021-10-06-15-37-42-533.png! > h3. 3、stop all tservers,and delete tablet wal > !image-2021-10-06-15-37-54-782.png! > h3. 4、start all tservers > we can see the index in wal starts counting from 1, but the opid_index > recorded in cmeta is the value 20 which is before deleting wal > > h4. wal > !image-2021-10-06-15-38-06-575.png! > > h4. cmeta > !image-2021-10-06-15-38-17-388.png! > > h3. 5、stop a tserver,trigger fault recovery > !image-2021-10-06-15-38-29-176.png! > when the leader recovery a replica, and master request change raft config to > add the new replica to new raft config, leader replica while ignored because > the opindex is smaller than that in cmeta. > > h3. 6、delete all wals > !image-2021-10-06-15-38-39-852.png! > h3. 7、kudu cluster rebalance > ./kudu cluster rebalance $MASTER_IP > !image-2021-10-06-15-38-53-343.png! > !image-2021-10-06-15-39-03-296.png! > rebalance is also failed when change raft config -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3311) Allow masters to start up with a list of masters with a diff of one from what's on disk
Andrew Wong created KUDU-3311: - Summary: Allow masters to start up with a list of masters with a diff of one from what's on disk Key: KUDU-3311 URL: https://issues.apache.org/jira/browse/KUDU-3311 Project: Kudu Issue Type: Improvement Components: master Reporter: Andrew Wong Now that Kudu automatically adds a master if we start up a new master alongside an existing only set of masters, we should also loosen the restriction that the gflag is the same as the existing Raft config, in case users want to add a master and then restart the entire cluster at the same time. This seems like it would be common enough for orchestration tools like CM, which marks the cluster as stale and suggests a full service restart upon adding a new master role. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3310) Checksum scan results for lagging replicas can be confusing
Andrew Wong created KUDU-3310: - Summary: Checksum scan results for lagging replicas can be confusing Key: KUDU-3310 URL: https://issues.apache.org/jira/browse/KUDU-3310 Project: Kudu Issue Type: Improvement Components: ops-tooling Reporter: Andrew Wong When running a checksum scan, we've seen cases where the following is reported: {code} Error: Remote error: Service unavailable: Timed out: could not wait for desired snapshot timestamp to be consistent: Timed out waiting for ts: P: 1621906 798986764 usec, L: 0 to be safe (mode: NON-LEADER). Current safe time: P: 1621906798962044 usec, L: 0 Physical time difference: 0.025s {code} and this results in messages like: {code} Aborted: checksum scan error: 1 errors were detected {code} Without much context about Kudu, this makes it seem like there is some corruption between replicas, even though the issue is just that the replica is lagging a bit. We should consider either: - allowing the wait time to be configured when running the tool, or - reword the result such that it's clear the scan failed and no checksums were verified for the tablet -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3290) Implement Replicate table's data to Kafka(or other Storage System)
[ https://issues.apache.org/jira/browse/KUDU-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393526#comment-17393526 ] Andrew Wong commented on KUDU-3290: --- Sorry for the late reply. I do think between the two, the learner replica seems more palatable, given it leaves the flexibility of decoupling the IO from the rest of the cluster (e.g. if we put all learners in a single tablet server). That said, it also increases the amount of IO done, given we have to replicate to an extra node. Maybe that's fine though. {quote} we can trigger a full scan at the timestamp and replicate data to learner, and then recover the appendEntries flow {quote} I'm not sure I understand this part, but I think you're referring to the conceptual equivalent of performing a tablet copy. When there aren't enough WALs in the leader to catch up a replica, the leader sends a tablet copy request to the follower, and the follower is "caught up" via a tablet copy, and then the tablet scanned and sent to Kafka. Is that right? In this case, for the remote learner, I wonder if in addition to a regular tablet copy to the local learner, there is room here to also rely on the incremental scan developed for backups. If the learner knows what index it has replicated to Kafka, it should also be able to keep track of the timestamp associated with that OpId. If so, Kudu should be able to perform a differential scan between that timestamp and the latest timestamp in the newly copied replica. Of course, if the retention window is too short, this wouldn't work. Also, in your proposal you mentioned replicas and leaders keeping track of more state for the sake of catching up the external service. If so, it'd be great if you could clarify exactly what state we would need (most recently replicated OpId? maybe its timestamp? anything else?) and where that state would be stored (with consensus metadata? somewhere else?). > Implement Replicate table's data to Kafka(or other Storage System) > -- > > Key: KUDU-3290 > URL: https://issues.apache.org/jira/browse/KUDU-3290 > Project: Kudu > Issue Type: New Feature > Components: tserver >Reporter: shenxingwuying >Priority: Critical > > h1. background & problem > We use kudu to store the user profile data, because business requirements, > exchange and share data from multi-tenant users, which is reasonable in our > application scene, we need replicate data from one system to another. The > destination storage system we pick kafka, because of our company's > architecture at now. > At this time, we have two ideas to solve it. > h1. two replication scheme > Generally, Raft group has three replicas, one is leader and the other two are > followers. We’ll add a replica, its role is Learner. Learner only receive all > the data, but not pariticipart in ther leadership election. > The learner replica, its state machine will be a plugin system, eg: > # We can support KuduEngine, which just a data backup like mongodb’s hidden > replica. > # We can write to the thirdparty store system, like kafka or any other > system we need. Then we can replicate data to another system use its client. > At Paxos has a learner role, which only receive data. we need such a role for > new membership. > But it Kudu Learner has been used for the copying(recovering) tablet replica. > Maybe we need a new role name, at this, we still use Learner to represent the > new role. (We should think over new role name) > In our application scene, we will replicate data to kafka, and I will explain > the method. > h2. Learner replication > # Add a new replica role, maybe we call it learner, because Paxos has a > learner role, which only receive data. We need such a role for new > membership. But at Kudu Learner has been used for the copying(recovering) > tablet replica. Maybe we need a new role name, at this, we still use Learner > to represent the new role. (We should think over new role name) > # The voters's safepoint of clean obsoleted wal is min(leader’ max wal > sequence number, followers max wal sequence number, learner’ max wal sequence > number) > # The learner not voter, not partitipant in elections > # Raft can replication data to the learner > # The process of learner applydb, just like raft followers, the logs before > committed index will replicate to kafka, kafka’s response ok. the apply index > will increase. > # We need kafka client, it will be added to kudu as an option, maybe as an > compile option > # When a kudu-tserver decomission or corrupted, the learner must move to new > kudu-tserver. So the leader should save learner apply OpId, and replicate to > followers, when learner's failover when leader down. > # The leader must save the learners apply O
[jira] [Resolved] (KUDU-3307) Update Kudu docker entry script to take data directories parameter
[ https://issues.apache.org/jira/browse/KUDU-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3307. --- Fix Version/s: 1.16.0 Resolution: Fixed > Update Kudu docker entry script to take data directories parameter > -- > > Key: KUDU-3307 > URL: https://issues.apache.org/jira/browse/KUDU-3307 > Project: Kudu > Issue Type: Improvement > Components: docker >Reporter: Bankim Bhavsar >Assignee: Andrew Wong >Priority: Major > Labels: newbie > Fix For: 1.16.0 > > > Current docker entry point script takes environment variable {{DATA_DIR}}. > However it's expected to be a single directory and that's supplied as > {{-fs_wal_dir}} and not as one would expect {{-fs_data_dirs}}. > [https://github.com/apache/kudu/blob/master/docker/kudu-entrypoint.sh#L41] > [https://github.com/apache/kudu/blob/master/docker/kudu-entrypoint.sh#L57-L59] > We need to make updates to the entry script to be able to supply separate > configuration for data directories. Need to ensure these directories are > either created in the script or possibly within kudu server. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KUDU-3307) Update Kudu docker entry script to take data directories parameter
[ https://issues.apache.org/jira/browse/KUDU-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-3307: - Assignee: Andrew Wong (was: Abhishek) > Update Kudu docker entry script to take data directories parameter > -- > > Key: KUDU-3307 > URL: https://issues.apache.org/jira/browse/KUDU-3307 > Project: Kudu > Issue Type: Improvement > Components: docker >Reporter: Bankim Bhavsar >Assignee: Andrew Wong >Priority: Major > Labels: newbie > > Current docker entry point script takes environment variable {{DATA_DIR}}. > However it's expected to be a single directory and that's supplied as > {{-fs_wal_dir}} and not as one would expect {{-fs_data_dirs}}. > [https://github.com/apache/kudu/blob/master/docker/kudu-entrypoint.sh#L41] > [https://github.com/apache/kudu/blob/master/docker/kudu-entrypoint.sh#L57-L59] > We need to make updates to the entry script to be able to supply separate > configuration for data directories. Need to ensure these directories are > either created in the script or possibly within kudu server. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3296) Synchronize the master addresses with the HMS when the Master performs a Raft config change
Andrew Wong created KUDU-3296: - Summary: Synchronize the master addresses with the HMS when the Master performs a Raft config change Key: KUDU-3296 URL: https://issues.apache.org/jira/browse/KUDU-3296 Project: Kudu Issue Type: Bug Components: master Reporter: Andrew Wong Today, the leader master can service config changes to add or remove replicas in its Raft config. It would be great if, after doing this successfully, the leader master also sent requests to the HMS to update all tables' metadata to reflect this change, given each HMS table entry includes the master addresses for discoverability. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KUDU-3294) ksck and rebalancer tools are useless if they can't resolve any of the tserver addresses
[ https://issues.apache.org/jira/browse/KUDU-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-3294: - Assignee: Andrew Wong > ksck and rebalancer tools are useless if they can't resolve any of the > tserver addresses > > > Key: KUDU-3294 > URL: https://issues.apache.org/jira/browse/KUDU-3294 > Project: Kudu > Issue Type: Bug > Components: ops-tooling >Reporter: Andrew Wong >Assignee: Andrew Wong >Priority: Major > > One of the first steps we perform when running {{ksck}} or the rebalancer > tool is resolve the addresses of all tablet servers to initialize the proxies > that will be used by each tool. If this step fails, the tool returns early, > resulting in pitifully barren output along the lines of "an empty cluster", > or simply a report that contains no tables, and not much to debug with other > than a complaint like the following: > {code:java} > Network error: error fetching the cluster metadata from the leader master: > unable to resolve address for : Name or service not known {code} > At worst, we should just skip over the tablet server and treat it as failed > in the report. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3294) ksck and rebalancer tools are useless if they can't resolve any of the tserver addresses
Andrew Wong created KUDU-3294: - Summary: ksck and rebalancer tools are useless if they can't resolve any of the tserver addresses Key: KUDU-3294 URL: https://issues.apache.org/jira/browse/KUDU-3294 Project: Kudu Issue Type: Bug Components: ops-tooling Reporter: Andrew Wong One of the first steps we perform when running {{ksck}} or the rebalancer tool is resolve the addresses of all tablet servers to initialize the proxies that will be used by each tool. If this step fails, the tool returns early, resulting in pitifully barren output along the lines of "an empty cluster", or simply a report that contains no tables, and not much to debug with other than a complaint like the following: {code:java} Network error: error fetching the cluster metadata from the leader master: unable to resolve address for : Name or service not known {code} At worst, we should just skip over the tablet server and treat it as failed in the report. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2302) Leader crashes if it can't resolve DNS address of a peer
[ https://issues.apache.org/jira/browse/KUDU-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-2302. --- Fix Version/s: 1.16 Resolution: Fixed > Leader crashes if it can't resolve DNS address of a peer > > > Key: KUDU-2302 > URL: https://issues.apache.org/jira/browse/KUDU-2302 > Project: Kudu > Issue Type: Bug > Components: consensus, master, tserver >Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, > 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0 >Reporter: Todd Lipcon >Assignee: Andrew Wong >Priority: Critical > Labels: crash, roadmap-candidate, stability > Fix For: 1.16 > > > In BecomeLeader we call: > {code} > CHECK_OK(BecomeLeaderUnlocked()); > {code} > This will fail if it fails to resolve the address of one of its peers. > Instead it should probably continue to be leader but consider attempts to RPC > to that peer to be failed due to network resolution (with periodic retries of > resolution) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KUDU-2302) Leader crashes if it can't resolve DNS address of a peer
[ https://issues.apache.org/jira/browse/KUDU-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-2302: - Assignee: Andrew Wong > Leader crashes if it can't resolve DNS address of a peer > > > Key: KUDU-2302 > URL: https://issues.apache.org/jira/browse/KUDU-2302 > Project: Kudu > Issue Type: Bug > Components: consensus, master, tserver >Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, > 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0 >Reporter: Todd Lipcon >Assignee: Andrew Wong >Priority: Critical > Labels: crash, roadmap-candidate, stability > > In BecomeLeader we call: > {code} > CHECK_OK(BecomeLeaderUnlocked()); > {code} > This will fail if it fails to resolve the address of one of its peers. > Instead it should probably continue to be leader but consider attempts to RPC > to that peer to be failed due to network resolution (with periodic retries of > resolution) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3291) Crash when performing a diff scan after delta flush races with a batch of ops that update the same row
[ https://issues.apache.org/jira/browse/KUDU-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3291. --- Fix Version/s: 1.15.0 Resolution: Fixed > Crash when performing a diff scan after delta flush races with a batch of ops > that update the same row > -- > > Key: KUDU-3291 > URL: https://issues.apache.org/jira/browse/KUDU-3291 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0 >Reporter: Andrew Wong >Assignee: Andrew Wong >Priority: Critical > Fix For: 1.15.0 > > > It's possible to run into the following crash: > {code:java} > F0604 23:20:50.032124 35483072 delta_store.h:153] Check failed: > a.delta_store_id == b.delta_store_id (4445773336 vs. 4445771896) > *** Check failure stack trace: *** > *** Aborted at 1622874050 (unix time) try "date -d @1622874050" if you are > using GNU date *** > PC: @ 0x7fff724b033a __pthread_kill > *** SIGABRT (@0x7fff724b033a) received by PID 69138 (TID 0x1021d6dc0) stack > trace: *** > @ 0x7fff725615fd _sigtramp > @ 0x7ffeef948568 (unknown) > @ 0x7fff72437808 abort > @0x107920599 google::logging_fail() > @0x10791f4cf google::LogMessage::SendToLog() > @0x10791fb95 google::LogMessage::Flush() > @0x107923c9f google::LogMessageFatal::~LogMessageFatal() > @0x107920b29 google::LogMessageFatal::~LogMessageFatal() > @0x1009ae07e > kudu::tablet::SelectedDeltas::DeltaLessThanFunctor::operator()() > @0x1009aa561 std::__1::max<>() > @0x10099c740 kudu::tablet::SelectedDeltas::ProcessDelta() > @0x10099e719 kudu::tablet::SelectedDeltas::MergeFrom() > @0x1009a2b30 kudu::tablet::DeltaPreparer<>::SelectDeltas() > @0x10094a545 kudu::tablet::DeltaFileIterator<>::SelectDeltas() > @0x10098b10c kudu::tablet::DeltaIteratorMerger::SelectDeltas() > @0x10097133f > kudu::tablet::DeltaApplier::InitializeSelectionVector() > @0x1056df4fb kudu::MaterializingIterator::MaterializeBlock() > @0x1056df2d8 kudu::MaterializingIterator::NextBlock() > @0x1056d1c5b kudu::MergeIterState::PullNextBlock() > @0x1056d5e62 kudu::MergeIterator::RefillHotHeap() > @0x1056d4f0b kudu::MergeIterator::Init() > @0x1006a413d kudu::tablet::Tablet::Iterator::Init() > @0x1002cb3b9 > kudu::tablet::DiffScanTest_TestDiffScanAfterDeltaFlush_Test::TestBody() > @0x1005f1b88 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x1005f1add testing::Test::Run() > @0x1005f2dd0 testing::TestInfo::Run() > @0x1005f3807 testing::TestSuite::Run() > @0x100601b57 testing::internal::UnitTestImpl::RunAllTests() > @0x100601418 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x10060139c testing::UnitTest::Run() > @0x100476201 RUN_ALL_TESTS() > @0x100475fa8 main > {code} > The [crash > line|https://github.com/apache/kudu/blob/e574903ace741a531c49aba15f97e856ea80ca4b/src/kudu/tablet/delta_store.h#L149] > assumes that all deltas for a given row that have the same timestamp belong > in the same delta store, and it uses this assumption to order the deltas in a > diff scan. > However, this is not true because, unlike the case for MRS flushes, we don't > wait for all ops to finish applying before flushing the DMS. This means that > a batch containing multiple updates to the same row may be spread across > multiple DMSs if we delta flush while the batch of updates is being applied. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3291) Crash when performing a diff scan after delta flush races with a batch of ops that update the same row
[ https://issues.apache.org/jira/browse/KUDU-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3291: -- Code Review: https://gerrit.cloudera.org/c/17547 > Crash when performing a diff scan after delta flush races with a batch of ops > that update the same row > -- > > Key: KUDU-3291 > URL: https://issues.apache.org/jira/browse/KUDU-3291 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0 >Reporter: Andrew Wong >Assignee: Andrew Wong >Priority: Critical > > It's possible to run into the following crash: > {code:java} > F0604 23:20:50.032124 35483072 delta_store.h:153] Check failed: > a.delta_store_id == b.delta_store_id (4445773336 vs. 4445771896) > *** Check failure stack trace: *** > *** Aborted at 1622874050 (unix time) try "date -d @1622874050" if you are > using GNU date *** > PC: @ 0x7fff724b033a __pthread_kill > *** SIGABRT (@0x7fff724b033a) received by PID 69138 (TID 0x1021d6dc0) stack > trace: *** > @ 0x7fff725615fd _sigtramp > @ 0x7ffeef948568 (unknown) > @ 0x7fff72437808 abort > @0x107920599 google::logging_fail() > @0x10791f4cf google::LogMessage::SendToLog() > @0x10791fb95 google::LogMessage::Flush() > @0x107923c9f google::LogMessageFatal::~LogMessageFatal() > @0x107920b29 google::LogMessageFatal::~LogMessageFatal() > @0x1009ae07e > kudu::tablet::SelectedDeltas::DeltaLessThanFunctor::operator()() > @0x1009aa561 std::__1::max<>() > @0x10099c740 kudu::tablet::SelectedDeltas::ProcessDelta() > @0x10099e719 kudu::tablet::SelectedDeltas::MergeFrom() > @0x1009a2b30 kudu::tablet::DeltaPreparer<>::SelectDeltas() > @0x10094a545 kudu::tablet::DeltaFileIterator<>::SelectDeltas() > @0x10098b10c kudu::tablet::DeltaIteratorMerger::SelectDeltas() > @0x10097133f > kudu::tablet::DeltaApplier::InitializeSelectionVector() > @0x1056df4fb kudu::MaterializingIterator::MaterializeBlock() > @0x1056df2d8 kudu::MaterializingIterator::NextBlock() > @0x1056d1c5b kudu::MergeIterState::PullNextBlock() > @0x1056d5e62 kudu::MergeIterator::RefillHotHeap() > @0x1056d4f0b kudu::MergeIterator::Init() > @0x1006a413d kudu::tablet::Tablet::Iterator::Init() > @0x1002cb3b9 > kudu::tablet::DiffScanTest_TestDiffScanAfterDeltaFlush_Test::TestBody() > @0x1005f1b88 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x1005f1add testing::Test::Run() > @0x1005f2dd0 testing::TestInfo::Run() > @0x1005f3807 testing::TestSuite::Run() > @0x100601b57 testing::internal::UnitTestImpl::RunAllTests() > @0x100601418 > testing::internal::HandleExceptionsInMethodIfSupported<>() > @0x10060139c testing::UnitTest::Run() > @0x100476201 RUN_ALL_TESTS() > @0x100475fa8 main > {code} > The [crash > line|https://github.com/apache/kudu/blob/e574903ace741a531c49aba15f97e856ea80ca4b/src/kudu/tablet/delta_store.h#L149] > assumes that all deltas for a given row that have the same timestamp belong > in the same delta store, and it uses this assumption to order the deltas in a > diff scan. > However, this is not true because, unlike the case for MRS flushes, we don't > wait for all ops to finish applying before flushing the DMS. This means that > a batch containing multiple updates to the same row may be spread across > multiple DMSs if we delta flush while the batch of updates is being applied. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3291) Crash when performing a diff scan after delta flush races with a batch of ops that update the same row
Andrew Wong created KUDU-3291: - Summary: Crash when performing a diff scan after delta flush races with a batch of ops that update the same row Key: KUDU-3291 URL: https://issues.apache.org/jira/browse/KUDU-3291 Project: Kudu Issue Type: Bug Affects Versions: 1.14.0, 1.13.0, 1.11.1, 1.12.0, 1.11.0, 1.10.1, 1.10.0 Reporter: Andrew Wong Assignee: Andrew Wong It's possible to run into the following crash: {code:java} F0604 23:20:50.032124 35483072 delta_store.h:153] Check failed: a.delta_store_id == b.delta_store_id (4445773336 vs. 4445771896) *** Check failure stack trace: *** *** Aborted at 1622874050 (unix time) try "date -d @1622874050" if you are using GNU date *** PC: @ 0x7fff724b033a __pthread_kill *** SIGABRT (@0x7fff724b033a) received by PID 69138 (TID 0x1021d6dc0) stack trace: *** @ 0x7fff725615fd _sigtramp @ 0x7ffeef948568 (unknown) @ 0x7fff72437808 abort @0x107920599 google::logging_fail() @0x10791f4cf google::LogMessage::SendToLog() @0x10791fb95 google::LogMessage::Flush() @0x107923c9f google::LogMessageFatal::~LogMessageFatal() @0x107920b29 google::LogMessageFatal::~LogMessageFatal() @0x1009ae07e kudu::tablet::SelectedDeltas::DeltaLessThanFunctor::operator()() @0x1009aa561 std::__1::max<>() @0x10099c740 kudu::tablet::SelectedDeltas::ProcessDelta() @0x10099e719 kudu::tablet::SelectedDeltas::MergeFrom() @0x1009a2b30 kudu::tablet::DeltaPreparer<>::SelectDeltas() @0x10094a545 kudu::tablet::DeltaFileIterator<>::SelectDeltas() @0x10098b10c kudu::tablet::DeltaIteratorMerger::SelectDeltas() @0x10097133f kudu::tablet::DeltaApplier::InitializeSelectionVector() @0x1056df4fb kudu::MaterializingIterator::MaterializeBlock() @0x1056df2d8 kudu::MaterializingIterator::NextBlock() @0x1056d1c5b kudu::MergeIterState::PullNextBlock() @0x1056d5e62 kudu::MergeIterator::RefillHotHeap() @0x1056d4f0b kudu::MergeIterator::Init() @0x1006a413d kudu::tablet::Tablet::Iterator::Init() @0x1002cb3b9 kudu::tablet::DiffScanTest_TestDiffScanAfterDeltaFlush_Test::TestBody() @0x1005f1b88 testing::internal::HandleExceptionsInMethodIfSupported<>() @0x1005f1add testing::Test::Run() @0x1005f2dd0 testing::TestInfo::Run() @0x1005f3807 testing::TestSuite::Run() @0x100601b57 testing::internal::UnitTestImpl::RunAllTests() @0x100601418 testing::internal::HandleExceptionsInMethodIfSupported<>() @0x10060139c testing::UnitTest::Run() @0x100476201 RUN_ALL_TESTS() @0x100475fa8 main {code} The [crash line|https://github.com/apache/kudu/blob/e574903ace741a531c49aba15f97e856ea80ca4b/src/kudu/tablet/delta_store.h#L149] assumes that all deltas for a given row that have the same timestamp belong in the same delta store, and it uses this assumption to order the deltas in a diff scan. However, this is not true because, unlike the case for MRS flushes, we don't wait for all ops to finish applying before flushing the DMS. This means that a batch containing multiple updates to the same row may be spread across multiple DMSs if we delta flush while the batch of updates is being applied. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KUDU-3290) Implement Replicate table's data to Kafka(or other Storage System)
[ https://issues.apache.org/jira/browse/KUDU-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356981#comment-17356981 ] Andrew Wong edited comment on KUDU-3290 at 6/3/21, 11:54 PM: - I'm curious what the use case is for the data stored in Kafka. Is it meant to be a physical backup of all ops that are sent to Kudu? A couple of other approaches come to mind that don't quite go as into the weeds into Kudu as what's outlined here, so I'm curious if they would fit your needs: * Periodically run a differential scan in Kudu, similar to an incremental backup, but storing the results in Kudu. Kudu's differential backup allows users to supply two (relatively recent) timestamps and Kudu will return the logical changes to the table that happened between those timestamps. This doesn't work that well if the goal is to get every individual mutation between the two timestamps, since diff scans summarize the changes between the two timestamps. However, if the process performing this ran frequently enough (e.g. every few seconds), getting a real-time replication may not be out of the picture. [Here's|https://github.com/apache/kudu/blob/master/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala#L78] an example of Spark using this kind of scan. * With whatever ingestion tool is currently writing to Kudu, also write to Kafka. Usually we see pipelines built with Spark or Nifi or Streamsets that ingest to Kudu – if you have such a pipeline, duplicating the pipeline into Kafka would be another relatively quick solution, though with the caveat that failures of either system may need to be worked around. One concern I would have about the built-in Raft replication to an external system is the fact that Kudu Raft replicates all writes before applying the operations to the underlying data stores. This means that we may replicate operations that fail because a row already exists, and that is only caught after replication. Depending on how you are using the Kafka replica, I'm not sure the best way to handle this. Perhaps that's okay for your usage, but I can see it being confusing to replicate row operations that failed in Kudu. was (Author: andrew.wong): I'm curious what the use case is for the data stored in Kafka. Is it meant to be a physical backup of all ops that are sent to Kudu? A couple of other approaches come to mind that don't quite go as into the weeds into Kudu as what's outlined here, so I'm curious if they would fit your needs: * Periodically run a differential scan in Kudu, similar to an incremental backup, but storing the results in Kudu. Kudu's differential backup allows users to supply two (relatively recent) timestamps and Kudu will return the logical changes to the table that happened between those timestamps. This doesn't work that well if the goal is to get every individual mutation between the two timestamps, since diff scans summarize the changes between the two timestamps. However, if the process performing this ran frequently enough (e.g. every few seconds), getting a real-time replication may not be out of the picture. [Here's|https://github.com/apache/kudu/blob/master/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala#L78] an example of Spark using this kind of scan. * With whatever ingestion tool is currently writing to Kudu, also write to Kafka. Usually we see pipelines built with Spark or Nifi or Streamsets that ingest to Kudu – if you have such a pipeline, duplicating the pipeline into Kafka would be another relatively quick solution, though with the caveat that failures of either system may need to be worked around. One concern I would have about the built-in replication to an external system is the fact that Kudu Raft replicates all writes before applying the operations to the underlying data stores. This means that we may replicate operations that fail because a row already exists, and that is only caught after replication. Depending on how you are using the Kafka replica, I'm not sure the best way to handle this. Perhaps that's okay for your usage, but I can see it being confusing to replicate row operations that failed in Kudu. > Implement Replicate table's data to Kafka(or other Storage System) > -- > > Key: KUDU-3290 > URL: https://issues.apache.org/jira/browse/KUDU-3290 > Project: Kudu > Issue Type: New Feature > Components: tserver >Reporter: shenxingwuying >Priority: Critical > > h1. background & problem > We use kudu to store the user profile data, because business requirements, > exchange and share data from multi-tenant users, which is reasonable in our > application scene, we need replicate data from one system to another. The
[jira] [Commented] (KUDU-3290) Implement Replicate table's data to Kafka(or other Storage System)
[ https://issues.apache.org/jira/browse/KUDU-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356981#comment-17356981 ] Andrew Wong commented on KUDU-3290: --- I'm curious what the use case is for the data stored in Kafka. Is it meant to be a physical backup of all ops that are sent to Kudu? A couple of other approaches come to mind that don't quite go as into the weeds into Kudu as what's outlined here, so I'm curious if they would fit your needs: * Periodically run a differential scan in Kudu, similar to an incremental backup, but storing the results in Kudu. Kudu's differential backup allows users to supply two (relatively recent) timestamps and Kudu will return the logical changes to the table that happened between those timestamps. This doesn't work that well if the goal is to get every individual mutation between the two timestamps, since diff scans summarize the changes between the two timestamps. However, if the process performing this ran frequently enough (e.g. every few seconds), getting a real-time replication may not be out of the picture. [Here's|https://github.com/apache/kudu/blob/master/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala#L78] an example of Spark using this kind of scan. * With whatever ingestion tool is currently writing to Kudu, also write to Kafka. Usually we see pipelines built with Spark or Nifi or Streamsets that ingest to Kudu – if you have such a pipeline, duplicating the pipeline into Kafka would be another relatively quick solution, though with the caveat that failures of either system may need to be worked around. One concern I would have about the built-in replication to an external system is the fact that Kudu Raft replicates all writes before applying the operations to the underlying data stores. This means that we may replicate operations that fail because a row already exists, and that is only caught after replication. Depending on how you are using the Kafka replica, I'm not sure the best way to handle this. Perhaps that's okay for your usage, but I can see it being confusing to replicate row operations that failed in Kudu. > Implement Replicate table's data to Kafka(or other Storage System) > -- > > Key: KUDU-3290 > URL: https://issues.apache.org/jira/browse/KUDU-3290 > Project: Kudu > Issue Type: New Feature > Components: tserver >Reporter: shenxingwuying >Priority: Critical > > h1. background & problem > We use kudu to store the user profile data, because business requirements, > exchange and share data from multi-tenant users, which is reasonable in our > application scene, we need replicate data from one system to another. The > destination storage system we pick kafka, because of our company's > architecture at now. > At this time, we have two ideas to solve it. > h1. two replication scheme > Generally, Raft group has three replicas, one is leader and the other two are > followers. We’ll add a replica, its role is Learner. Learner only receive all > the data, but not pariticipart in ther leadership election. > The leaner replica, its state machine will be a plugin system, eg: > # We can support KuduEngine, which just a data backup like mongodb’s hidden > replica. > # We can write to the thirdparty store system, like kafka or any other > system we need. Then we can replicate data to another system use its client. > At Paxos has a learner role, which only receive data. we need such a role for > new membership. > But it Kudu Learner has been used for the copying(recovering) tablet replica. > Maybe we need a new role name, at this, we still use Learner to represent the > new role. (We should think over new role name) > In our application scene, we will replicate data to kafka, and I will explain > the method. > h2. Learner replication > # Add a new replica role, maybe we call it learner, because Paxos has a > learner role, which only receive data. We need such a role for new > membership. But at Kudu Learner has been used for the copying(recovering) > tablet replica. Maybe we need a new role name, at this, we still use Learner > to represent the new role. (We should think over new role name) > # The voters's safepoint of clean obsoleted wal is min(leader’ max wal > sequence number, followers max wal sequence number, learner’ max wal sequence > number) > # The learner not voter, not partitipant in elections > # Raft can replication data to the learner > # The process of learner applydb, just like raft followers, the logs before > committed index will replicate to kafka, kafka’s response ok. the apply index > will increase. > # We need kafka client, it will be added to kudu as an option, maybe as an > compile option > # When a kudu-tserver decomi
[jira] [Resolved] (KUDU-3288) tserver segfault when processing DeleteTablet
[ https://issues.apache.org/jira/browse/KUDU-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3288. --- Fix Version/s: 1.15.0 Resolution: Duplicate This is likely a duplicate of KUDU-3268, which is fixed in the upcoming release (1.15.0). > tserver segfault when processing DeleteTablet > - > > Key: KUDU-3288 > URL: https://issues.apache.org/jira/browse/KUDU-3288 > Project: Kudu > Issue Type: Bug > Components: tserver >Affects Versions: 1.14.0 >Reporter: mintao >Priority: Major > Fix For: 1.15.0 > > > In the core dump, the stack: > {code:java} > #0 0x0251e403 in > kudu::MaintenanceManager::LaunchOp(kudu::MaintenanceOp*) () at > /opt/kudu/kudu/src/kudu/util/maintenance_manager.cc:551 > #1 0x0257c98e in operator() (this=0x7f4425076af0) at > /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260 > #2 kudu::ThreadPool::DispatchThread() () at > /opt/kudu/kudu/src/kudu/util/threadpool.cc:662 > #3 0x02573e25 in operator() (this=0x6f86fe8) at > /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260 > #4 kudu::Thread::SuperviseThread(void*) () at > /opt/kudu/kudu/src/kudu/util/thread.cc:674 > #5 0x7f442c9bfe25 in start_thread () from /lib64/libpthread.so.0 > #6 0x7f442ac95bad in clone () from /lib64/libc.so.6 > {code} > The local variables : > {code:java} > thread_id = 164113 > op_instance = {thread_id = 164113, > name = > "CompactRowSetsOp(2c61e21e2e0b4caba1736b5c248dd65e)\000\000\000\000\000\000\350\270\344\002", > '\000' , > "W\000\000\000\345\005\000\000\250ǻ>\001\000\000\000\260ߣi\000\000\000\000\000\030\323\347", > '\000' , > "P\344\033\313\033\063\\\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\064-><.h > 8f6Do!^#=12=?( , "onse\001\377\a\000\000\000\000\000\000\000\000"..., > duration = {static kUninitialized = -9223372036854775808, > nano_delta_ = -9223372036854775808}, start_mono_time = {static > kNanosecondsPerSecond = 10, static kNanosecondsPerMillisecond = > 100, > static kNanosecondsPerMicrosecond = 1000, static kMicrosecondsPerSecond = > 100, nanos_ = 32139819439241529}} > scoped_cleanup_L582 = > trace = > sw = > {code} > In the Tablet server's log, saw this: > {code:java} > I0526 09:47:39.229526 86465 tablet_replica.cc:291] T > 2c61e21e2e0b4caba1736b5c248dd65e P c12ad54315b24a61b8c47ccd7a3ddf7e: stopping > tablet replica > I0526 09:47:39.230662 86464 ts_tablet_manager.cc:1552] T > 02e056b7c982476db5bd5249f7806cbd P c12ad54315b24a61b8c47ccd7a3ddf7e: Deleting > tablet data with delete state TABLET_DATA_DELETED > I0526 09:47:39.234947 164344 maintenance_manager.cc:373] P > c12ad54315b24a61b8c47ccd7a3ddf7e: Scheduling > CompactRowSetsOp(2c61e21e2e0b4caba1736b5c248dd65e): perf score=0.012862 > I0526 09:47:39.234983 86465 raft_consensus.cc:2226] T > 2c61e21e2e0b4caba1736b5c248dd65e P c12ad54315b24a61b8c47ccd7a3ddf7e [term 1 > FOLLOWER]: Raft consensus shutting down. > I0526 09:47:39.235006 86465 raft_consensus.cc:2255] T > 2c61e21e2e0b4caba1736b5c248dd65e P c12ad54315b24a61b8c47ccd7a3ddf7e [term 1 > FOLLOWER]: Raft consensus is shut down! > {code} > Tablet server tried to perform RowSet Compacting on a Deleting tablet. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KUDU-3258) Expose some kind of transaction dashboard in ksck or the web UI
[ https://issues.apache.org/jira/browse/KUDU-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-3258: - Assignee: Andrew Wong > Expose some kind of transaction dashboard in ksck or the web UI > --- > > Key: KUDU-3258 > URL: https://issues.apache.org/jira/browse/KUDU-3258 > Project: Kudu > Issue Type: Improvement > Components: ops-tooling, transactions >Reporter: Andrew Wong >Assignee: Andrew Wong >Priority: Major > > It would be useful to expose the locations and tablet IDs of the > TxnStatusManager replicas, and even show the health of them from unified > front, whether that's the web UI, ksck, or both. Some useful things to know > about: > - The tablet ID, range, and location of each TxnStatusManager partition > - The highest transaction ID per TxnStatusManager partition > - In-flight (not COMMITTED or ABORTED) transactions and their current state, > though would also be nice to filter specific states > - Commit timestamp (and other relevant timestamps, if available, reported > with physical and logical portions) > - We could also consider storing the transaction creation time in the same > way that we have a "time created" for tables in the masters > After some discussion with Alexey, we think it'd be more useful to focus on: > * having a separate section in ksck to display the health of the transaction > status table > * having a separate tool to focus on displaying the business logic of the > TxnStatusManager partitions (not the web UI, for now) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3271) Tablet server crashed when handle scan request
[ https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3271. --- Fix Version/s: 1.13.0 Resolution: Fixed I checked out the commit before 163cd25 and copied over the test in the patch. After running it a couple times, I ran into: {code:java} I0408 22:49:44.993857 54213 ts_tablet_manager.cc:1144] T P dbfd161726d64fa0b01e8a9237fb37d1: Time spent starting tablet: real 0.004s user 0.002s sys 0.002s I0408 22:49:44.993940 54215 raft_consensus.cc:683] T P dbfd161726d64fa0b01e8a9237fb37d1 [term 1 LEADER]: Becoming Leader. State: Replica: dbfd161726d64fa0b01e8a9237fb37d1, State: Running, Role: LEADER W0408 22:49:44.993994 54151 reactor.cc:681] Failed to create an outbound connection to 255.255.255.255:1 because connect() failed: Network error: connect(2) error: Network is unreachable (error 101) I0408 22:49:44.994019 54215 consensus_queue.cc:227] T P dbfd161726d64fa0b01e8a9237fb37d1 [LEADER]: Queue going to LEADER mode. State: All replicated index: 0, Majority replicated index: 0, Committed index: 0, Last appended: 0.0, Last appended by leader: 0, Current term: 1, Majority size: 1, State: 0, Mode: LEADER, active raft config: opid_index: -1 peers { permanent_uuid: "dbfd161726d64fa0b01e8a9237fb37d1" member_type: VOTER last_known_addr { host: "127.0.0.1" port: 44157 } } *** Aborted at 1617947385 (unix time) try "date -d @1617947385" if you are using GNU date *** I0408 22:49:45.024998 54168 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=100, remote={username='awong'} at 127.0.0.1:60548 I0408 22:49:45.025013 54167 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=100, remote={username='awong'} at 127.0.0.1:60548 I0408 22:49:45.025015 54166 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=101, remote={username='awong'} at 127.0.0.1:60548 I0408 22:49:45.025023 54163 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=100, remote={username='awong'} at 127.0.0.1:60548 I0408 22:49:45.025087 54167 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=101, remote={username='awong'} at 127.0.0.1:60548 I0408 22:49:45.025157 54167 tablet_service.cc:2747] Scan: Not found: Scanner c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence id=100, remote={username='awong'} at 127.0.0.1:60548 PC: @ 0x229eed3 kudu::UnionIterator::HasNext() *** SIGSEGV (@0x0) received by PID 54140 (TID 0x7fa30cfde700) from PID 0; stack trace: *** @ 0x7fa31d2b9370 (unknown) @ 0x229eed3 kudu::UnionIterator::HasNext() @ 0xb3300c kudu::tserver::TabletServiceImpl::HandleContinueScanRequest() @ 0xb45a09 kudu::tserver::TabletServiceImpl::Scan() @ 0x2227b79 kudu::rpc::GeneratedServiceIf::Handle() @ 0x2228839 kudu::rpc::ServicePool::RunThread() @ 0x23af01f kudu::Thread::SuperviseThread() @ 0x7fa31d2b1dc5 start_thread @ 0x7fa31b60976d __clone Segmentation fault {code} So I think it's safe to say this was indeed addressed by Todd's locking commit. [~zhangyifan27] If you're able, feel free to pull 163cd25 into your version of Kudu to prevent this in the future, or consider upgrading to 1.13 or higher. > Tablet server crashed when handle scan request > -- > > Key: KUDU-3271 > URL: https://issues.apache.org/jira/browse/KUDU-3271 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.12.0 >Reporter: YifanZhang >Priority: Major > Fix For: 1.13.0 > > Attachments: tablet-52a743.log > > > We found that one of kudu tablet server crashed when handle scan request. The > scanned table didn't have any row operations at that time. This issue only > came up once so far. > Coredump stack is: > {code:java} > Program terminated with signal 11, Segmentation fault. > (gdb) bt > #0 kudu::tablet::DeltaApplier::HasNext (this=) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84 > #1 0x02185900 in kudu::UnionIterator::HasNext (this=) > at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051 > #2 0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner > (this=0x4fea140, scanner_id=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195 > #3 0x009e
[jira] [Commented] (KUDU-3271) Tablet server crashed when handle scan request
[ https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317570#comment-17317570 ] Andrew Wong commented on KUDU-3271: --- I suspect this may have been fixed by [163cd25|https://github.com/apache/kudu/commit/163cd25] which landed in 1.13. I'll try checking out the code prior to that change and seeing if this is reproducible. > Tablet server crashed when handle scan request > -- > > Key: KUDU-3271 > URL: https://issues.apache.org/jira/browse/KUDU-3271 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.12.0 >Reporter: YifanZhang >Priority: Major > Attachments: tablet-52a743.log > > > We found that one of kudu tablet server crashed when handle scan request. The > scanned table didn't have any row operations at that time. This issue only > came up once so far. > Coredump stack is: > {code:java} > Program terminated with signal 11, Segmentation fault. > (gdb) bt > #0 kudu::tablet::DeltaApplier::HasNext (this=) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84 > #1 0x02185900 in kudu::UnionIterator::HasNext (this=) > at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051 > #2 0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner > (this=0x4fea140, scanner_id=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195 > #3 0x009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, > __in_chrg=) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179 > #4 kudu::tserver::TabletServiceImpl::HandleContinueScanRequest > (this=this@entry=0x60edef0, req=req@entry=0x9582e880, > rpc_context=rpc_context@entry=0x8151d7800, > result_collector=result_collector@entry=0x7f2d721679f0, > has_more_results=has_more_results@entry=0x7f2d721678f9, > error_code=error_code@entry=0x7f2d721678fc) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737 > #5 0x009fb009 in kudu::tserver::TabletServiceImpl::Scan > (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907 > #6 0x0210f019 in operator() (__args#2=0x8151d7800, > __args#1=0xb87b16de0, __args#0=, this=0x4e0c7708) at > /usr/include/c++/4.8.2/functional:2471 > #7 kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139 > #8 0x0210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) > at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225 > #9 0x0228ecaf in operator() (this=0xc1a58c28) at > /usr/include/c++/4.8.2/functional:2471 > #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at > /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 > 0x7f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 > 0x7f2de4e6873d in clone () from /lib64/libc.so.6 > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3268) Crash in TabletServerDiskErrorTest.TestRandomOpSequence
[ https://issues.apache.org/jira/browse/KUDU-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3268. --- Fix Version/s: 1.15.0 Resolution: Fixed > Crash in TabletServerDiskErrorTest.TestRandomOpSequence > --- > > Key: KUDU-3268 > URL: https://issues.apache.org/jira/browse/KUDU-3268 > Project: Kudu > Issue Type: Bug > Components: test, tserver >Reporter: Andrew Wong >Priority: Major > Fix For: 1.15.0 > > Attachments: tablet_server-test.3.txt.gz > > > A pre-commit failed with the following crash when attempting to launch an op > after stopping a replica: > {code:java} > I0323 18:15:01.078991 23854 maintenance_manager.cc:373] P > c8a93089db0041f5930b9fb1832714ed: Scheduling > CompactRowSetsOp(): perf score=1.012452 > I0323 18:15:01.079111 21067 tablet_server-test.cc:852] Tablet server > responded with: timestamp: 6621279441214984192 > I0323 18:15:01.079317 23789 maintenance_manager.cc:594] P > c8a93089db0041f5930b9fb1832714ed: > UndoDeltaBlockGCOp() complete. Timing: real > 0.000suser 0.000s sys 0.000s Metrics: > {"cfile_init":1,"lbm_read_time_us":73,"lbm_reads_lt_1ms":4} > E0323 18:15:01.080865 23788 cfile_reader.cc:591] Encountered corrupted CFile > in filesystem block: 4124746176525068430 > I0323 18:15:01.080960 23788 ts_tablet_manager.cc:1774] T > P c8a93089db0041f5930b9fb1832714ed: failing > tablet > I0323 18:15:01.080950 21067 tablet_server-test.cc:852] Tablet server > responded with: timestamp: 6621279441223315456 > I0323 18:15:01.081243 24138 tablet_replica.cc:324] T > P c8a93089db0041f5930b9fb1832714ed: stopping > tablet replica > I0323 18:15:01.081670 21067 tablet_server-test.cc:852] Tablet server > responded with: error { > code: TABLET_NOT_RUNNING > status { > code: ILLEGAL_STATE > message: "Tablet not RUNNING: STOPPING" > } > } > I0323 18:15:01.081777 21067 tablet_server-test.cc:890] Failure was caught by > an op! > W0323 18:15:01.082907 23788 tablet_mm_ops.cc:176] T > P c8a93089db0041f5930b9fb1832714ed: > Compaction failed on : Corruption: Flush to > disk failed: checksum error on CFile block 4124746176525068430 at offset=1006 > size=24: Checksum does not match: 3582029077 vs expected 3582029077 > I0323 18:15:01.082957 23788 maintenance_manager.cc:594] P > c8a93089db0041f5930b9fb1832714ed: > CompactRowSetsOp() complete. Timing: real > 0.004s user 0.003s sys 0.000s Metrics: > {"cfile_cache_miss":3,"cfile_cache_miss_bytes":92,"delta_iterators_relevant":2,"dirs.queue_time_us":630,"dirs.run_cpu_time_us":368,"dirs.run_wall_time_us":2220,"lbm_read_time_us":54,"lbm_reads_lt_1ms":3,"lbm_write_time_us":168,"lbm_writes_lt_1ms":6,"num_input_rowsets":2,"spinlock_wait_cycles":1792,"tablet-open.queue_time_us":135,"thread_start_us":382,"threads_started":5} > I0323 18:15:01.083369 23854 maintenance_manager.cc:373] P > c8a93089db0041f5930b9fb1832714ed: Scheduling > CompactRowSetsOp(): perf score=1.012452 > *** Aborted at 1616523301 (unix time) try "date -d @1616523301" if you are > using GNU date *** > I0323 18:15:01.083519 24138 raft_consensus.cc:2226] T > P c8a93089db0041f5930b9fb1832714ed [term 1 > LEADER]: Raft consensus shutting down. > I0323 18:15:01.083653 24138 raft_consensus.cc:2255] T > P c8a93089db0041f5930b9fb1832714ed [term 1 > FOLLOWER]: Raft consensus is shut down! > I0323 18:15:01.085090 21067 tablet_server-test.cc:894] Tablet was > successfully failed > I0323 18:15:01.085439 21067 tablet_server.cc:166] TabletServer@127.0.0.1:0 > shutting down... > PC: @ 0x7ff97596de0d kudu::MaintenanceManager::LaunchOp() > *** SIGSEGV (@0x30) received by PID 21067 (TID 0x7ff96343b700) from PID 48; > stack trace: *** > @ 0x7ff976846980 (unknown) at ??:0 > @ 0x7ff97596de0d kudu::MaintenanceManager::LaunchOp() at ??:0 > @ 0x7ff97596b538 > _ZZN4kudu18MaintenanceManager18RunSchedulerThreadEvENKUlvE_clEv at ??:0 > @ 0x7ff97596f124 > _ZNSt17_Function_handlerIFvvEZN4kudu18MaintenanceManager18RunSchedulerThreadEvEUlvE_E9_M_invokeERKSt9_Any_data > at ??:0 > @ 0x7ff977e2bcf4 std::function<>::operator()() at ??:0 > @ 0x7ff975a05e6e kudu::ThreadPool::DispatchThread() at ??:0 > @ 0x7ff975a06757 _ZZN4kudu10ThreadPool12CreateThreadEvENKUlvE_clEv at > ??:0 > @ 0x7ff975a07e7b > _ZNSt17_Function_handlerIFvvEZN4kudu10ThreadPool12CreateThreadEvEUlvE_E9_M_invokeERKSt9_Any_data > at ??:0 > @ 0x7ff977e2bcf4 std::function
[jira] [Comment Edited] (KUDU-3271) Tablet server crashed when handle scan request
[ https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314068#comment-17314068 ] Andrew Wong edited comment on KUDU-3271 at 4/2/21, 8:20 PM: [~zhangyifan27] Thanks for reporting this. Is there anything in the INFO logs that you think might be useful in getting to the bottom of this? Do you know what scans were running around this time? Were there any special or new workloads running that hadn't run before? was (Author: andrew.wong): [~zhangyifan27] Thanks for reporting this. Is there anything in the INFO logs that you think might be useful in getting to the bottom of this? > Tablet server crashed when handle scan request > -- > > Key: KUDU-3271 > URL: https://issues.apache.org/jira/browse/KUDU-3271 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.12.0 >Reporter: YifanZhang >Priority: Major > > We found that one of kudu tablet server crashed when handle scan request. The > scanned table didn't have any row operations at that time. This issue only > came up once so far. > Coredump stack is: > {code:java} > Program terminated with signal 11, Segmentation fault. > (gdb) bt > #0 kudu::tablet::DeltaApplier::HasNext (this=) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84 > #1 0x02185900 in kudu::UnionIterator::HasNext (this=) > at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051 > #2 0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner > (this=0x4fea140, scanner_id=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195 > #3 0x009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, > __in_chrg=) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179 > #4 kudu::tserver::TabletServiceImpl::HandleContinueScanRequest > (this=this@entry=0x60edef0, req=req@entry=0x9582e880, > rpc_context=rpc_context@entry=0x8151d7800, > result_collector=result_collector@entry=0x7f2d721679f0, > has_more_results=has_more_results@entry=0x7f2d721678f9, > error_code=error_code@entry=0x7f2d721678fc) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737 > #5 0x009fb009 in kudu::tserver::TabletServiceImpl::Scan > (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907 > #6 0x0210f019 in operator() (__args#2=0x8151d7800, > __args#1=0xb87b16de0, __args#0=, this=0x4e0c7708) at > /usr/include/c++/4.8.2/functional:2471 > #7 kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139 > #8 0x0210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) > at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225 > #9 0x0228ecaf in operator() (this=0xc1a58c28) at > /usr/include/c++/4.8.2/functional:2471 > #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at > /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 > 0x7f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 > 0x7f2de4e6873d in clone () from /lib64/libc.so.6 > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3271) Tablet server crashed when handle scan request
[ https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314068#comment-17314068 ] Andrew Wong commented on KUDU-3271: --- [~zhangyifan27] Thanks for reporting this. Is there anything in the INFO logs that you think might be useful in getting to the bottom of this? > Tablet server crashed when handle scan request > -- > > Key: KUDU-3271 > URL: https://issues.apache.org/jira/browse/KUDU-3271 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.12.0 >Reporter: YifanZhang >Priority: Major > > We found that one of kudu tablet server crashed when handle scan request. The > scanned table didn't have any row operations at that time. This issue only > came up once so far. > Coredump stack is: > {code:java} > Program terminated with signal 11, Segmentation fault. > (gdb) bt > #0 kudu::tablet::DeltaApplier::HasNext (this=) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84 > #1 0x02185900 in kudu::UnionIterator::HasNext (this=) > at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051 > #2 0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner > (this=0x4fea140, scanner_id=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195 > #3 0x009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, > __in_chrg=) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179 > #4 kudu::tserver::TabletServiceImpl::HandleContinueScanRequest > (this=this@entry=0x60edef0, req=req@entry=0x9582e880, > rpc_context=rpc_context@entry=0x8151d7800, > result_collector=result_collector@entry=0x7f2d721679f0, > has_more_results=has_more_results@entry=0x7f2d721678f9, > error_code=error_code@entry=0x7f2d721678fc) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737 > #5 0x009fb009 in kudu::tserver::TabletServiceImpl::Scan > (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907 > #6 0x0210f019 in operator() (__args#2=0x8151d7800, > __args#1=0xb87b16de0, __args#0=, this=0x4e0c7708) at > /usr/include/c++/4.8.2/functional:2471 > #7 kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139 > #8 0x0210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) > at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225 > #9 0x0228ecaf in operator() (this=0xc1a58c28) at > /usr/include/c++/4.8.2/functional:2471 > #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at > /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 > 0x7f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 > 0x7f2de4e6873d in clone () from /lib64/libc.so.6 > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3268) Crash in TabletServerDiskErrorTest.TestRandomOpSequence
[ https://issues.apache.org/jira/browse/KUDU-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3268: -- Description: A pre-commit failed with the following crash when attempting to launch an op after stopping a replica: {code:java} I0323 18:15:01.078991 23854 maintenance_manager.cc:373] P c8a93089db0041f5930b9fb1832714ed: Scheduling CompactRowSetsOp(): perf score=1.012452 I0323 18:15:01.079111 21067 tablet_server-test.cc:852] Tablet server responded with: timestamp: 6621279441214984192 I0323 18:15:01.079317 23789 maintenance_manager.cc:594] P c8a93089db0041f5930b9fb1832714ed: UndoDeltaBlockGCOp() complete. Timing: real 0.000s user 0.000s sys 0.000s Metrics: {"cfile_init":1,"lbm_read_time_us":73,"lbm_reads_lt_1ms":4} E0323 18:15:01.080865 23788 cfile_reader.cc:591] Encountered corrupted CFile in filesystem block: 4124746176525068430 I0323 18:15:01.080960 23788 ts_tablet_manager.cc:1774] T P c8a93089db0041f5930b9fb1832714ed: failing tablet I0323 18:15:01.080950 21067 tablet_server-test.cc:852] Tablet server responded with: timestamp: 6621279441223315456 I0323 18:15:01.081243 24138 tablet_replica.cc:324] T P c8a93089db0041f5930b9fb1832714ed: stopping tablet replica I0323 18:15:01.081670 21067 tablet_server-test.cc:852] Tablet server responded with: error { code: TABLET_NOT_RUNNING status { code: ILLEGAL_STATE message: "Tablet not RUNNING: STOPPING" } } I0323 18:15:01.081777 21067 tablet_server-test.cc:890] Failure was caught by an op! W0323 18:15:01.082907 23788 tablet_mm_ops.cc:176] T P c8a93089db0041f5930b9fb1832714ed: Compaction failed on : Corruption: Flush to disk failed: checksum error on CFile block 4124746176525068430 at offset=1006 size=24: Checksum does not match: 3582029077 vs expected 3582029077 I0323 18:15:01.082957 23788 maintenance_manager.cc:594] P c8a93089db0041f5930b9fb1832714ed: CompactRowSetsOp() complete. Timing: real 0.004suser 0.003s sys 0.000s Metrics: {"cfile_cache_miss":3,"cfile_cache_miss_bytes":92,"delta_iterators_relevant":2,"dirs.queue_time_us":630,"dirs.run_cpu_time_us":368,"dirs.run_wall_time_us":2220,"lbm_read_time_us":54,"lbm_reads_lt_1ms":3,"lbm_write_time_us":168,"lbm_writes_lt_1ms":6,"num_input_rowsets":2,"spinlock_wait_cycles":1792,"tablet-open.queue_time_us":135,"thread_start_us":382,"threads_started":5} I0323 18:15:01.083369 23854 maintenance_manager.cc:373] P c8a93089db0041f5930b9fb1832714ed: Scheduling CompactRowSetsOp(): perf score=1.012452 *** Aborted at 1616523301 (unix time) try "date -d @1616523301" if you are using GNU date *** I0323 18:15:01.083519 24138 raft_consensus.cc:2226] T P c8a93089db0041f5930b9fb1832714ed [term 1 LEADER]: Raft consensus shutting down. I0323 18:15:01.083653 24138 raft_consensus.cc:2255] T P c8a93089db0041f5930b9fb1832714ed [term 1 FOLLOWER]: Raft consensus is shut down! I0323 18:15:01.085090 21067 tablet_server-test.cc:894] Tablet was successfully failed I0323 18:15:01.085439 21067 tablet_server.cc:166] TabletServer@127.0.0.1:0 shutting down... PC: @ 0x7ff97596de0d kudu::MaintenanceManager::LaunchOp() *** SIGSEGV (@0x30) received by PID 21067 (TID 0x7ff96343b700) from PID 48; stack trace: *** @ 0x7ff976846980 (unknown) at ??:0 @ 0x7ff97596de0d kudu::MaintenanceManager::LaunchOp() at ??:0 @ 0x7ff97596b538 _ZZN4kudu18MaintenanceManager18RunSchedulerThreadEvENKUlvE_clEv at ??:0 @ 0x7ff97596f124 _ZNSt17_Function_handlerIFvvEZN4kudu18MaintenanceManager18RunSchedulerThreadEvEUlvE_E9_M_invokeERKSt9_Any_data at ??:0 @ 0x7ff977e2bcf4 std::function<>::operator()() at ??:0 @ 0x7ff975a05e6e kudu::ThreadPool::DispatchThread() at ??:0 @ 0x7ff975a06757 _ZZN4kudu10ThreadPool12CreateThreadEvENKUlvE_clEv at ??:0 @ 0x7ff975a07e7b _ZNSt17_Function_handlerIFvvEZN4kudu10ThreadPool12CreateThreadEvEUlvE_E9_M_invokeERKSt9_Any_data at ??:0 @ 0x7ff977e2bcf4 std::function<>::operator()() at ??:0 @ 0x7ff9759f8913 kudu::Thread::SuperviseThread() at ??:0 @ 0x7ff97683b6db start_thread at ??:0 @ 0x7ff97388e71f clone at ??:0 {code} Attached the full logs -- seems there's something unsafe about how we unregister ops (maybe from the fix for KUDU-3149?) when racing with the scheduler thread. was: A pre-commit failed with the following crash when attempting to launch an op after stopping a replica: {code:java} I0323 18:15:01.079317 23789 maintenance_manager.cc:594] P c8a93089db0041f5930b9fb1832714ed: UndoDeltaBlockGCOp() complete. Timing: real 0.000s user 0.00
[jira] [Created] (KUDU-3268) Crash in TabletServerDiskErrorTest.TestRandomOpSequence
Andrew Wong created KUDU-3268: - Summary: Crash in TabletServerDiskErrorTest.TestRandomOpSequence Key: KUDU-3268 URL: https://issues.apache.org/jira/browse/KUDU-3268 Project: Kudu Issue Type: Bug Components: test, tserver Reporter: Andrew Wong Attachments: tablet_server-test.3.txt.gz A pre-commit failed with the following crash when attempting to launch an op after stopping a replica: {code:java} I0323 18:15:01.079317 23789 maintenance_manager.cc:594] P c8a93089db0041f5930b9fb1832714ed: UndoDeltaBlockGCOp() complete. Timing: real 0.000s user 0.000s sys 0.000s Metrics: {"cfile_init":1,"lbm_read_time_us":73,"lbm_reads_lt_1ms":4}I0323 18:15:01.079317 23789 maintenance_manager.cc:594] P c8a93089db0041f5930b9fb1832714ed: UndoDeltaBlockGCOp() complete. Timing: real 0.000s user 0.000s sys 0.000s Metrics: {"cfile_init":1,"lbm_read_time_us":73,"lbm_reads_lt_1ms":4}E0323 18:15:01.080865 23788 cfile_reader.cc:591] Encountered corrupted CFile in filesystem block: 4124746176525068430I0323 18:15:01.080960 23788 ts_tablet_manager.cc:1774] T P c8a93089db0041f5930b9fb1832714ed: failing tabletI0323 18:15:01.080950 21067 tablet_server-test.cc:852] Tablet server responded with: timestamp: 6621279441223315456I0323 18:15:01.081243 24138 tablet_replica.cc:324] T P c8a93089db0041f5930b9fb1832714ed: stopping tablet replicaI0323 18:15:01.081670 21067 tablet_server-test.cc:852] Tablet server responded with: error { code: TABLET_NOT_RUNNING status { code: ILLEGAL_STATE message: "Tablet not RUNNING: STOPPING" }}I0323 18:15:01.081777 21067 tablet_server-test.cc:890] Failure was caught by an op!W0323 18:15:01.082907 23788 tablet_mm_ops.cc:176] T P c8a93089db0041f5930b9fb1832714ed: Compaction failed on : Corruption: Flush to disk failed: checksum error on CFile block 4124746176525068430 at offset=1006 size=24: Checksum does not match: 3582029077 vs expected 3582029077I0323 18:15:01.082957 23788 maintenance_manager.cc:594] P c8a93089db0041f5930b9fb1832714ed: CompactRowSetsOp() complete. Timing: real 0.004s user 0.003s sys 0.000s Metrics: {"cfile_cache_miss":3,"cfile_cache_miss_bytes":92,"delta_iterators_relevant":2,"dirs.queue_time_us":630,"dirs.run_cpu_time_us":368,"dirs.run_wall_time_us":2220,"lbm_read_time_us":54,"lbm_reads_lt_1ms":3,"lbm_write_time_us":168,"lbm_writes_lt_1ms":6,"num_input_rowsets":2,"spinlock_wait_cycles":1792,"tablet-open.queue_time_us":135,"thread_start_us":382,"threads_started":5}I0323 18:15:01.083369 23854 maintenance_manager.cc:373] P c8a93089db0041f5930b9fb1832714ed: Scheduling CompactRowSetsOp(): perf score=1.012452*** Aborted at 1616523301 (unix time) try "date -d @1616523301" if you are using GNU date ***I0323 18:15:01.083519 24138 raft_consensus.cc:2226] T P c8a93089db0041f5930b9fb1832714ed [term 1 LEADER]: Raft consensus shutting down.I0323 18:15:01.083653 24138 raft_consensus.cc:2255] T P c8a93089db0041f5930b9fb1832714ed [term 1 FOLLOWER]: Raft consensus is shut down!I0323 18:15:01.085090 21067 tablet_server-test.cc:894] Tablet was successfully failedI0323 18:15:01.085439 21067 tablet_server.cc:166] TabletServer@127.0.0.1:0 shutting down...PC: @ 0x7ff97596de0d kudu::MaintenanceManager::LaunchOp()*** SIGSEGV (@0x30) received by PID 21067 (TID 0x7ff96343b700) from PID 48; stack trace: *** @ 0x7ff976846980 (unknown) at ??:0 @ 0x7ff97596de0d kudu::MaintenanceManager::LaunchOp() at ??:0 @ 0x7ff97596b538 _ZZN4kudu18MaintenanceManager18RunSchedulerThreadEvENKUlvE_clEv at ??:0 @ 0x7ff97596f124 _ZNSt17_Function_handlerIFvvEZN4kudu18MaintenanceManager18RunSchedulerThreadEvEUlvE_E9_M_invokeERKSt9_Any_data at ??:0 @ 0x7ff977e2bcf4 std::function<>::operator()() at ??:0 @ 0x7ff975a05e6e kudu::ThreadPool::DispatchThread() at ??:0 @ 0x7ff975a06757 _ZZN4kudu10ThreadPool12CreateThreadEvENKUlvE_clEv at ??:0 @ 0x7ff975a07e7b _ZNSt17_Function_handlerIFvvEZN4kudu10ThreadPool12CreateThreadEvEUlvE_E9_M_invokeERKSt9_Any_data at ??:0 @ 0x7ff977e2bcf4 std::function<>::operator()() at ??:0 @ 0x7ff9759f8913 kudu::Thread::SuperviseThread() at ??:0 @ 0x7ff97683b6db start_thread at ??:0 @ 0x7ff97388e71f clone at ??:0 {code} Attached the full logs -- seems there's something unsafe about how we unregister ops (maybe from the fix for KUDU-3149?) when racing with the scheduler thread. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3154) RangerClientTestBase.TestLogging sometimes fails
[ https://issues.apache.org/jira/browse/KUDU-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3154. --- Fix Version/s: 1.13.0 Resolution: Fixed The merged patch fixed it in our environment. Separately, we've seen that exceptions in the Ranger client are what typically causes these kinds of hangs. > RangerClientTestBase.TestLogging sometimes fails > > > Key: KUDU-3154 > URL: https://issues.apache.org/jira/browse/KUDU-3154 > Project: Kudu > Issue Type: Bug > Components: ranger, test >Affects Versions: 1.13.0 >Reporter: Alexey Serbin >Priority: Major > Fix For: 1.13.0 > > Attachments: kudu-3154_jstacks.txt, ranger_client-test.txt, > ranger_client-test.txt.xz > > > The {{RangerClientTestBase.TestLogging}} scenario of the > {{ranger_client-test}} sometimes fails (all types of builds) with error > message like below: > {noformat} > src/kudu/ranger/ranger_client-test.cc:398: Failure > Failed > > Bad status: Timed out: timed out while in flight > > I0620 07:06:02.907177 1140 server.cc:247] Received an EOF from the > subprocess > I0620 07:06:02.910923 1137 server.cc:317] get failed, inbound queue shut > down: Aborted: > I0620 07:06:02.910964 1141 server.cc:380] outbound queue shut down: Aborted: > > I0620 07:06:02.910995 1138 server.cc:317] get failed, inbound queue shut > down: Aborted: > I0620 07:06:02.910984 1139 server.cc:317] get failed, inbound queue shut > down: Aborted: > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3258) Expose some kind of transaction dashboard in ksck or the web UI
[ https://issues.apache.org/jira/browse/KUDU-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3258: -- Description: It would be useful to expose the locations and tablet IDs of the TxnStatusManager replicas, and even show the health of them from unified front, whether that's the web UI, ksck, or both. Some useful things to know about: - The tablet ID, range, and location of each TxnStatusManager partition - The highest transaction ID per TxnStatusManager partition - In-flight (not COMMITTED or ABORTED) transactions and their current state, though would also be nice to filter specific states - Commit timestamp (and other relevant timestamps, if available, reported with physical and logical portions) - We could also consider storing the transaction creation time in the same way that we have a "time created" for tables in the masters After some discussion with Alexey, we think it'd be more useful to focus on: * having a separate section in ksck to display the health of the transaction status table * having a separate tool to focus on displaying the business logic of the TxnStatusManager partitions (not the web UI, for now) was: It would be useful to expose the locations and tablet IDs of the TxnStatusManager replicas, and even show the health of them from unified front, whether that's the web UI, ksck, or both. Some useful things to know about: - The tablet ID, range, and location of each TxnStatusManager partition - The highest transaction ID per TxnStatusManager partition - In-flight (not COMMITTED or ABORTED) transactions and their current state > Expose some kind of transaction dashboard in ksck or the web UI > --- > > Key: KUDU-3258 > URL: https://issues.apache.org/jira/browse/KUDU-3258 > Project: Kudu > Issue Type: Improvement > Components: ops-tooling, transactions >Reporter: Andrew Wong >Priority: Major > > It would be useful to expose the locations and tablet IDs of the > TxnStatusManager replicas, and even show the health of them from unified > front, whether that's the web UI, ksck, or both. Some useful things to know > about: > - The tablet ID, range, and location of each TxnStatusManager partition > - The highest transaction ID per TxnStatusManager partition > - In-flight (not COMMITTED or ABORTED) transactions and their current state, > though would also be nice to filter specific states > - Commit timestamp (and other relevant timestamps, if available, reported > with physical and logical portions) > - We could also consider storing the transaction creation time in the same > way that we have a "time created" for tables in the masters > After some discussion with Alexey, we think it'd be more useful to focus on: > * having a separate section in ksck to display the health of the transaction > status table > * having a separate tool to focus on displaying the business logic of the > TxnStatusManager partitions (not the web UI, for now) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3226) Validate List of Masters for kudu ksck
[ https://issues.apache.org/jira/browse/KUDU-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3226. --- Fix Version/s: 1.15.0 Resolution: Fixed > Validate List of Masters for kudu ksck > -- > > Key: KUDU-3226 > URL: https://issues.apache.org/jira/browse/KUDU-3226 > Project: Kudu > Issue Type: Improvement >Reporter: David Mollitor >Assignee: Abhishek >Priority: Minor > Labels: beginner, newbie, trivial > Fix For: 1.15.0 > > > I recently accidentally included a list of masters where I fat-fingered the > host names and included the same node twice. > I got some stack trace and an error message about duplicate keys inserted > into a map. I eventually figured out what I did wrong, but the error > condition was not super helpful. > Please add an early validation step that explicitly captures this conditions > and provides a helpful error message that includes the name of the host name > which was duplicated on the command line. > This happened for me with {{kudu ksck}} but there may be other candidates as > well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3261) Support updates and deletes in transactions
[ https://issues.apache.org/jira/browse/KUDU-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3261: -- Description: Kudu currently only supports multi-row, multi-partition transactions for INSERT and INSERT_IGNORE operations. We should consider extending the Kudu tablet store to support: - Tracking Mutations in the MRS that are associated with a transaction - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per rowset per transaction. These delta trackers should be merged with the main delta trackers of each DRS. I'm not sure if it will be helpful, but I have [a patch|https://gerrit.cloudera.org/c/16387/] to encapsulate some of the delta applying logic – I suspect it might be useful in defining a delta iterator that spits out delta keys with a singular timestamp, as well as for defining a "mergeable" delta input (note we have a merge now, but it does a simple sequential merge of delta stores with the assumption that the input iterators are disjoint by timestamp, which may not be the case if we have transactional delta trackers that overlap in time with the main delta tracker). The DeltaReaders for the DRSs should consider the transaction's finalized commit timestamp (or lack thereof) in the same way that the MRS iterator considers mutations in the context of a snapshot. was: Kudu currently only supports multi-row, multi-partition transactions for INSERT and INSERT_IGNORE operations. We should consider extending the Kudu tablet store to support: - Tracking Mutations in the MRS that are associated with a transaction - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per rowset per transaction. These delta trackers should be merged with the main delta trackers of each DRS. I'm not sure if it will be helpful, but I have a patch to encapsulate some of the delta applying logic – I suspect it might be useful in defining a delta iterator that spits out delta keys with a singular timestamp, as well as for defining a "mergeable" delta input (note we have a merge now, but it does a simple sequential merge of delta stores with the assumption that the input iterators are disjoint by timestamp, which may not be the case if we have transactional delta trackers that overlap in time with the main delta tracker). The DeltaReaders for the DRSs should consider the transaction's finalized commit timestamp (or lack thereof) in the same way that the MRS iterator considers mutations in the context of a snapshot. > Support updates and deletes in transactions > --- > > Key: KUDU-3261 > URL: https://issues.apache.org/jira/browse/KUDU-3261 > Project: Kudu > Issue Type: Improvement > Components: tablet, transactions >Reporter: Andrew Wong >Priority: Major > > Kudu currently only supports multi-row, multi-partition transactions for > INSERT and INSERT_IGNORE operations. We should consider extending the Kudu > tablet store to support: > - Tracking Mutations in the MRS that are associated with a transaction > - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per > rowset per transaction. These delta trackers should be merged with the main > delta trackers of each DRS. I'm not sure if it will be helpful, but I have [a > patch|https://gerrit.cloudera.org/c/16387/] to encapsulate some of the delta > applying logic – I suspect it might be useful in defining a delta iterator > that spits out delta keys with a singular timestamp, as well as for defining > a "mergeable" delta input (note we have a merge now, but it does a simple > sequential merge of delta stores with the assumption that the input iterators > are disjoint by timestamp, which may not be the case if we have transactional > delta trackers that overlap in time with the main delta tracker). > The DeltaReaders for the DRSs should consider the transaction's finalized > commit timestamp (or lack thereof) in the same way that the MRS iterator > considers mutations in the context of a snapshot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3261) Support updates and deletes in transactions
[ https://issues.apache.org/jira/browse/KUDU-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3261: -- Description: Kudu currently only supports multi-row, multi-partition transactions for INSERT and INSERT_IGNORE operations. We should consider extending the Kudu tablet store to support: - Tracking Mutations in the MRS that are associated with a transaction - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per rowset per transaction. These delta trackers should be merged with the main delta trackers of each DRS. I'm not sure if it will be helpful, but I have a patch to encapsulate some of the delta applying logic – I suspect it might be useful in defining a delta iterator that spits out delta keys with a singular timestamp, as well as for defining a "mergeable" delta input (note we have a merge now, but it does a simple sequential merge of delta stores with the assumption that the input iterators are disjoint by timestamp, which may not be the case if we have transactional delta trackers that overlap in time with the main delta tracker). The DeltaReaders for the DRSs should consider the transaction's finalized commit timestamp (or lack thereof) in the same way that the MRS iterator considers mutations in the context of a snapshot. was: Kudu currently only supports multi-row, multi-partition transactions for INSERT and INSERT_IGNORE operations. We should consider extending the Kudu tablet store to support: - Tracking Mutations in the MRS that are associated with a transaction - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per rowset per transaction The DeltaReaders for the DRSs should consider the transaction's finalized commit timestamp (or lack thereof) in the same way that the MRS iterator considers mutations in the context of a snapshot. > Support updates and deletes in transactions > --- > > Key: KUDU-3261 > URL: https://issues.apache.org/jira/browse/KUDU-3261 > Project: Kudu > Issue Type: Improvement > Components: tablet, transactions >Reporter: Andrew Wong >Priority: Major > > Kudu currently only supports multi-row, multi-partition transactions for > INSERT and INSERT_IGNORE operations. We should consider extending the Kudu > tablet store to support: > - Tracking Mutations in the MRS that are associated with a transaction > - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per > rowset per transaction. These delta trackers should be merged with the main > delta trackers of each DRS. I'm not sure if it will be helpful, but I have a > patch to encapsulate some of the delta applying logic – I suspect it might be > useful in defining a delta iterator that spits out delta keys with a singular > timestamp, as well as for defining a "mergeable" delta input (note we have a > merge now, but it does a simple sequential merge of delta stores with the > assumption that the input iterators are disjoint by timestamp, which may not > be the case if we have transactional delta trackers that overlap in time with > the main delta tracker). > The DeltaReaders for the DRSs should consider the transaction's finalized > commit timestamp (or lack thereof) in the same way that the MRS iterator > considers mutations in the context of a snapshot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3262) Define more forward-looking feature flags for transactions
Andrew Wong created KUDU-3262: - Summary: Define more forward-looking feature flags for transactions Key: KUDU-3262 URL: https://issues.apache.org/jira/browse/KUDU-3262 Project: Kudu Issue Type: Improvement Components: supportability, transactions Reporter: Andrew Wong There are several extensions to transactions that seem reasonable for Kudu's roadmap. It's thus worth defining some kind of TransactionOpts, e.g. allowing users to define locking strategy and deadlock control, allowing us to more easily develop new functionality without overhauling the existing transactions API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3261) Support updates and deletes in transactions
Andrew Wong created KUDU-3261: - Summary: Support updates and deletes in transactions Key: KUDU-3261 URL: https://issues.apache.org/jira/browse/KUDU-3261 Project: Kudu Issue Type: Improvement Components: tablet, transactions Reporter: Andrew Wong Kudu currently only supports multi-row, multi-partition transactions for INSERT and INSERT_IGNORE operations. We should consider extending the Kudu tablet store to support: - Tracking Mutations in the MRS that are associated with a transaction - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per rowset per transaction The DeltaReaders for the DRSs should consider the transaction's finalized commit timestamp (or lack thereof) in the same way that the MRS iterator considers mutations in the context of a snapshot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3260) Reduce the on-disk footprint of transaction metadata in participants
Andrew Wong created KUDU-3260: - Summary: Reduce the on-disk footprint of transaction metadata in participants Key: KUDU-3260 URL: https://issues.apache.org/jira/browse/KUDU-3260 Project: Kudu Issue Type: Improvement Components: server, transactions Reporter: Andrew Wong We should remove the commit timestamp and txn metadata once we've flushed all in-memory stores that rely on metadata for determining commit timestamp. If we see tablets serving more frequent transactions, the persisted metadata may become less negligible. One thing to watch out for here is that we currently use metadata to determine whether a transaction has _ever_ existed on the participant – if we simply get rid of the metadata, we will lose out on this knowledge. More thought should be given to how and when to do this safely, and ensure that we clean up the metadata in such a way that no invariants are broken with regards to knowing about transaction existence (e.g. perhaps only clean up the txn metadata if the corresponding TxnStatusManager has been deleted?). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3259) Define ownership of transactions for participants to prevent malicious users from writing to a transaction
Andrew Wong created KUDU-3259: - Summary: Define ownership of transactions for participants to prevent malicious users from writing to a transaction Key: KUDU-3259 URL: https://issues.apache.org/jira/browse/KUDU-3259 Project: Kudu Issue Type: Improvement Components: security, transactions Reporter: Andrew Wong Currently, any user can write as a part of a transaction. This isn't necessarily safe, though at the very least, Kudu still performs its authz checks on every write request that enters the system. When a participant calls BEGIN_TXN, we should consider also persisting the username of the writer, which should also get validated on the call to RegisterParticipant. Once successful, further write requests can be rejected if they are from other users. Note that calls to the TxnStatusManager are protected in this way (e.g. calls to commit or rollback will validate that the caller matches the 'user' field in the {{TxnStatusEntryPB}}. One thing to be cognizant of here is that if we are going to persist more metadata per transaction, we should strongly consider ways to reduce the amount of metadata stored in a single SuperBlockPB file. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3258) Expose some kind of transaction dashboard in ksck or the web UI
Andrew Wong created KUDU-3258: - Summary: Expose some kind of transaction dashboard in ksck or the web UI Key: KUDU-3258 URL: https://issues.apache.org/jira/browse/KUDU-3258 Project: Kudu Issue Type: Improvement Components: ops-tooling, transactions Reporter: Andrew Wong It would be useful to expose the locations and tablet IDs of the TxnStatusManager replicas, and even show the health of them from unified front, whether that's the web UI, ksck, or both. Some useful things to know about: - The tablet ID, range, and location of each TxnStatusManager partition - The highest transaction ID per TxnStatusManager partition - In-flight (not COMMITTED or ABORTED) transactions and their current state -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3257) Add tooling for operating on transactions
Andrew Wong created KUDU-3257: - Summary: Add tooling for operating on transactions Key: KUDU-3257 URL: https://issues.apache.org/jira/browse/KUDU-3257 Project: Kudu Issue Type: Improvement Components: ops-tooling, transactions Reporter: Andrew Wong We should expose transactions to operators who want to observe or even interfere with transactions (in case something has already gone wrong). A simple tool to wrap the TxnSystemClient seems like a great place to start, exposing commands like: Wrappers for TxnParticipant calls: - kudu remote_replica begin_txn - kudu remote_replica begin_commit - kudu remote_replica finalize_commit (should be used sparingly!) - kudu remote_replica abort_txn Wrappers for the TxnStatusManager calls: - kudu txns list - kudu txns show - kudu txns start_txn - kudu txns commit - kudu txns rollback - kudu txns keep_alive Wrappers for operating on the transaction status table: - kudu txns create_txn_status_table - kudu txns add_txn_status_table range - kudu txns drop_txn_status_table range -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3256) Limit the memory usage per transaction per tablet
Andrew Wong created KUDU-3256: - Summary: Limit the memory usage per transaction per tablet Key: KUDU-3256 URL: https://issues.apache.org/jira/browse/KUDU-3256 Project: Kudu Issue Type: Improvement Components: transactions Reporter: Andrew Wong Currently, the transactions implementation stores all new inserts in a new MRS per transaction per tablet. As transactions get larger and larger, or as there are more transactions entering the system, this will result in memory pressure across tablet servers. We should explore ways to limit the memory usage per transaction, by either enforcing a memory limit per transaction participant, or by flushing transactional MRSs before committing, per regular maintenance op cadence (e.g. based on memory pressure, MRS size, time since last flush, etc.). While it'd be significantly more complex, I'm more partial to the latter approach – the mechanics to flush an MRS already exist, so why not use them? It should be noted though that we would then need to update how bootstrapping is handled by persisting a 'last_flushed_mrs_id' per transaction, similar to what's done today for non-transactional MRSs. Additionally, the existing code to swap in new disk rowsets atomically would need some thought to ensure swapping in transactional rowsets while racing with a commit does the right thing (i.e. if we flush the transactional MRS while committing, the end result is the new DRSs should end up in the main rowset tree). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3213) Java client should attempt a different tablet server when retrying during tserver quiescing
[ https://issues.apache.org/jira/browse/KUDU-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3213. --- Fix Version/s: 1.15.0 Resolution: Fixed > Java client should attempt a different tablet server when retrying during > tserver quiescing > --- > > Key: KUDU-3213 > URL: https://issues.apache.org/jira/browse/KUDU-3213 > Project: Kudu > Issue Type: Bug > Components: java >Reporter: Andrew Wong >Priority: Major > Fix For: 1.15.0 > > > One of our clusters ran into the following error message when leaving a > tablet server quiesced for an extended period of time: > {code:java} > ERROR Runner: Pipeline exception occurred: org.apache.spark.SparkException: > Job aborted due to stage failure: Task 1 in stage 6.0 failed 4 times, most > recent failure: Lost task 1.3 in stage 6.0 (TID 1922, > tserver-018.edh.company.com, executor 58): > org.apache.kudu.client.NonRecoverableException: cannot complete before > timeout: ScanRequest(scannerId=null, tablet=9e17b554f85f4a7f855771d8b5c913f5, > attempt=24, KuduRpc(method=Scan, tablet=9e17b554f85f4a7f855771d8b5c913f5, > attempt=24, DeadlineTracker(timeout=3, elapsed=27988), Traces: [0ms] > refreshing cache from master, [1ms] Sub RPC GetTableLocations: sending RPC to > server master-name-003.edh.company.com:7051, [12ms] Sub RPC > GetTableLocations: received response from server > master-name-003.edh.company.com:7051: OK, [22ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [116ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [117ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [126ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [129ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [129ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [146ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [149ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [149ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [166ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [168ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [168ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [206ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [209ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [209ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [266ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [268ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [268ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [306ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [308ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [308ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [545ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [548ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [548ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [865ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [868ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [868ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [1266ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [1269ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [1269ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [2626ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [2628ms] delaying RPC due to: Service > unavailable: Tablet server is quiescing (error 0), [2628ms] received response > from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet > server is quiescing (error 0), [4746ms] sending RPC to server > e1a4405443d845249b5ed15c8e882211, [4749
[jira] [Commented] (KUDU-3252) FsManager initialized contention if two tserver instances try to call CreateInitialFileSystemLayout on the same folder
[ https://issues.apache.org/jira/browse/KUDU-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293375#comment-17293375 ] Andrew Wong commented on KUDU-3252: --- If the issue is the absence of file locking, an implementation of WAL instance files (as might be a part of KUDU-2975) that leverages {{DirInstanceMetadataFiles}} (see [https://github.com/apache/kudu/blob/master/src/kudu/fs/dir_util.h#L74)] could help prevent this. > FsManager initialized contention if two tserver instances try to call > CreateInitialFileSystemLayout on the same folder > --- > > Key: KUDU-3252 > URL: https://issues.apache.org/jira/browse/KUDU-3252 > Project: Kudu > Issue Type: Bug >Reporter: Redriver >Priority: Critical > > {color:#1d1c1d}I scanned the Kudu source code for DeleteDir invocation, there > are 2 places: > {color}[https://github.com/apache/kudu/blob/master/src/kudu/fs/fs_manager.cc#L384]{color:#1d1c1d} > and > {color}[https://github.com/apache/kudu/blob/master/src/kudu/fs/fs_manager.cc#L485]{color:#1d1c1d}.{color} > {color:#1d1c1d}Imagine, I start kudu-tserver twice by mistake, finally only > one kudu-tserver started, is it possible that its folder may be removed by > another kudu-tserver since it failed to start.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3109) Log administrative operations
[ https://issues.apache.org/jira/browse/KUDU-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260074#comment-17260074 ] Andrew Wong commented on KUDU-3109: --- In addition to this, it'd be helpful to know about any kind of natural re-replication that has happened. I was recently looking at a case in which it seemed like some bad blocks had replicated (in an older version that didn't check checksums) to other servers. Knowing about which tablets were copied from which servers would have been helpful. This could have been pieced together via the glog output, but the logs didn't go back far enough. > Log administrative operations > - > > Key: KUDU-3109 > URL: https://issues.apache.org/jira/browse/KUDU-3109 > Project: Kudu > Issue Type: Task > Components: security >Reporter: Attila Bukor >Priority: Minor > > Sometimes it's impossible to determine what caused an issue when > administrators run unsafe commands on the cluster. Logging these in an audit > log would help. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3109) Log administrative operations
[ https://issues.apache.org/jira/browse/KUDU-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3109: -- Issue Type: New Feature (was: Task) Priority: Major (was: Minor) > Log administrative operations > - > > Key: KUDU-3109 > URL: https://issues.apache.org/jira/browse/KUDU-3109 > Project: Kudu > Issue Type: New Feature > Components: security >Reporter: Attila Bukor >Priority: Major > > Sometimes it's impossible to determine what caused an issue when > administrators run unsafe commands on the cluster. Logging these in an audit > log would help. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3229) Tooling to examine and operate on log blocks
Andrew Wong created KUDU-3229: - Summary: Tooling to examine and operate on log blocks Key: KUDU-3229 URL: https://issues.apache.org/jira/browse/KUDU-3229 Project: Kudu Issue Type: Improvement Components: fs, ops-tooling Reporter: Andrew Wong It's somewhat troublesome to examine the contents of a log block container today. Tooling exists in the form of {{kudu pbc dump}} for metadata and {{hexdump}} for data, but it'd be nice to have more specialized tooling for examining containers to understand things like: * What blocks are in this container? When was each block last updated? You can piece this together from the {{kudu pbc dump}} on the metadata, but having something more tabular might be nice. * Does each block actually contain any data? If not, which don't? * Does each block have a valid header if it were a CFile block? Some of the information I'd like to get at falls out of the purview of the log block manager itself, and requires information like what kind of blocks we're dealing with. But the underlying struggle I'd like to address is: given a container, can we be more rigorous about our checks that the data is OK, and flag blocks that appear broken? The context of this was a (Kudu version 1.5.x) case in which some form of corruption occurred, and we were left with containers that appeared to have holes punched out of them, resulting in messages complaining about bad CFile header magic values of "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" (vs the expected "kuducfl2"). The log block metadata and tablet metadata both had records of many blocks, but the corresponding locations in the data files were all zeroes. It's unclear how this happened, but even just examining the containers and blocks therein was not well-documented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3228) Background process that checks for corrupted block and failed disks
Andrew Wong created KUDU-3228: - Summary: Background process that checks for corrupted block and failed disks Key: KUDU-3228 URL: https://issues.apache.org/jira/browse/KUDU-3228 Project: Kudu Issue Type: Improvement Components: cfile, fs, tserver Reporter: Andrew Wong Currently, CFile corruption and failed disks will result in any bad tablets being marked as failed, being re-replicated elsewhere, and any scans that were in progress for them being retried at other servers. Rather than waiting for the first bad access to do this, we may want to implement a background task that checks for corruption and proactively re-replicates such tablets. That way, especially when there are long periods of client inactivity, we can the faulty-hardware-related re-replication out of the way. The task should probably only run when the tserver isn't serving many scans or writes. It should also avoid polluting the block cache, if attempting to check for CFile corruption. HDFS has a "disk checker" task that may be worth drawing inspiration from. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3227) Improve client error message when not a part of a trusted subnet
Andrew Wong created KUDU-3227: - Summary: Improve client error message when not a part of a trusted subnet Key: KUDU-3227 URL: https://issues.apache.org/jira/browse/KUDU-3227 Project: Kudu Issue Type: Improvement Components: client Reporter: Andrew Wong I recently saw a case where the Java application spit out this error, failing to connect to the cluster: {code:java} Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config (master:7051) has no leader. Exceptions received: org.apache.kudu.client.RecoverableException: connection disconnected at org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:289) at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:49) at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:365) at org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:354) at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280) at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259) at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1315) at com.stumbleupon.async.Deferred.doCall(Deferred.java:1286) {code} Other clients (i.e. Impala) were able to run similar queries without such errors, so it seemed localized to this one application. This was odd given the error message complains about not having a Master leader, a property of the cluster, not the client. Inspecting the master logs, it was relatively clear that {{--trusted_subnets}} was likely to blame (the server-side warning message that is spit out mentions it by name). It would be nice if this was obvious in the clients as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3227) Improve client error message when not a part of a trusted subnet
[ https://issues.apache.org/jira/browse/KUDU-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3227: -- Labels: newbie (was: ) > Improve client error message when not a part of a trusted subnet > > > Key: KUDU-3227 > URL: https://issues.apache.org/jira/browse/KUDU-3227 > Project: Kudu > Issue Type: Improvement > Components: client >Reporter: Andrew Wong >Priority: Major > Labels: newbie > > I recently saw a case where the Java application spit out this error, failing > to connect to the cluster: > {code:java} > Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config > (master:7051) has no leader. Exceptions received: > org.apache.kudu.client.RecoverableException: connection disconnected > at > org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:289) > at > org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:49) > at > org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:365) > at > org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:354) > at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280) > at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259) > at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1315) > at com.stumbleupon.async.Deferred.doCall(Deferred.java:1286) > {code} > Other clients (i.e. Impala) were able to run similar queries without such > errors, so it seemed localized to this one application. This was odd given > the error message complains about not having a Master leader, a property of > the cluster, not the client. > Inspecting the master logs, it was relatively clear that > {{--trusted_subnets}} was likely to blame (the server-side warning message > that is spit out mentions it by name). It would be nice if this was obvious > in the clients as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3222) std::bad_alloc in full_stack-insert-scan-test.cc
[ https://issues.apache.org/jira/browse/KUDU-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249400#comment-17249400 ] Andrew Wong commented on KUDU-3222: --- FWIW we did have a run succeed on commit 420b07e6490e14f26107088bc1b09866b6d43bba. If this was caused by a code change, it is likely the gperftools bump 713fee390d0241bf466f490d5b2c678f7ebe5175, since the glog change was reverted. > std::bad_alloc in full_stack-insert-scan-test.cc > > > Key: KUDU-3222 > URL: https://issues.apache.org/jira/browse/KUDU-3222 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: Andrew Wong >Priority: Major > Attachments: FullStackScanInsertMRSOnly3.log > > > Recently we've been starting to see the following in our runs of > full_stack-insert-scan-test: > {code:java} > I1214 13:30:32.995853 39072 full_stack-insert-scan-test.cc:271] Insertion > thread 7 of 50 is 69% done. > terminate called after throwing an instance of 'std::bad_alloc' > what(): std::bad_alloc > *** Aborted at 1607981433 (unix time) try "date -d @1607981433" if you are > using GNU date *** > PC: @ 0x3f85032625 __GI_raise > *** SIGABRT (@0x11569802) received by PID 38914 (TID 0x7f81b4a02700) from > PID 38914; stack trace: *** > @ 0xcf8a21 google::(anonymous namespace)::FailureSignalHandler() > @ 0x3f8540f710 (unknown) > @ 0x3f85032625 __GI_raise > @ 0x3f85033e05 __GI_abort > @ 0x3f884bea7d __gnu_cxx::__verbose_terminate_handler() > @ 0x3f884bcbd6 (unknown) > @ 0x3f884bcc03 std::terminate() > @ 0x3f884bcd22 __cxa_throw > @ 0xd14bd5 (anonymous namespace)::handle_oom() > @ 0x2ff3872 tcmalloc::allocate_full_cpp_throw_oom() > @ 0x2cd4c1a > _ZNSt6vectorIN4kudu19DecodedRowOperationESaIS1_EE17_M_realloc_insertIJRKS1_EEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_ > @ 0x2cd535a kudu::RowOperationsPBDecoder::DecodeOperations<>() > @ 0x131c2a6 kudu::tablet::Tablet::DecodeWriteOperations() > I1214 13:30:33.075912 39094 full_stack-insert-scan-test.cc:271] Insertion > thread 29 of 50 is 69% done. > @ 0x135bcb6 kudu::tablet::WriteOp::Prepare() > @ 0x13514ac kudu::tablet::OpDriver::Prepare() > @ 0x13520ed > _ZNSt17_Function_handlerIFvvEZN4kudu6tablet8OpDriver12ExecuteAsyncEvEUlvE_E9_M_invokeERKSt9_Any_data > @ 0x2e2409e kudu::ThreadPool::DispatchThread() > @ 0x2e1b2c5 kudu::Thread::SuperviseThread() > @ 0x3f854079d1 start_thread > @ 0x3f850e88fd clone {code} > This runs as a part of a suite of several tests in {{scripts/benchmarks.sh}}. > This started happening fairly consistently starting around commit > 2943aa701ee092158c2084c614a91f92513ef7c4, when we bumped glog and gperftools, > though I'm not sure they are directly related here. > The attached logs are a run on CentOS 6.6, with around 100GB of memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3222) std::bad_alloc in full_stack-insert-scan-test.cc
Andrew Wong created KUDU-3222: - Summary: std::bad_alloc in full_stack-insert-scan-test.cc Key: KUDU-3222 URL: https://issues.apache.org/jira/browse/KUDU-3222 Project: Kudu Issue Type: Bug Components: test Reporter: Andrew Wong Attachments: FullStackScanInsertMRSOnly3.log Recently we've been starting to see the following in our runs of full_stack-insert-scan-test: {code:java} I1214 13:30:32.995853 39072 full_stack-insert-scan-test.cc:271] Insertion thread 7 of 50 is 69% done. terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc *** Aborted at 1607981433 (unix time) try "date -d @1607981433" if you are using GNU date *** PC: @ 0x3f85032625 __GI_raise *** SIGABRT (@0x11569802) received by PID 38914 (TID 0x7f81b4a02700) from PID 38914; stack trace: *** @ 0xcf8a21 google::(anonymous namespace)::FailureSignalHandler() @ 0x3f8540f710 (unknown) @ 0x3f85032625 __GI_raise @ 0x3f85033e05 __GI_abort @ 0x3f884bea7d __gnu_cxx::__verbose_terminate_handler() @ 0x3f884bcbd6 (unknown) @ 0x3f884bcc03 std::terminate() @ 0x3f884bcd22 __cxa_throw @ 0xd14bd5 (anonymous namespace)::handle_oom() @ 0x2ff3872 tcmalloc::allocate_full_cpp_throw_oom() @ 0x2cd4c1a _ZNSt6vectorIN4kudu19DecodedRowOperationESaIS1_EE17_M_realloc_insertIJRKS1_EEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_ @ 0x2cd535a kudu::RowOperationsPBDecoder::DecodeOperations<>() @ 0x131c2a6 kudu::tablet::Tablet::DecodeWriteOperations() I1214 13:30:33.075912 39094 full_stack-insert-scan-test.cc:271] Insertion thread 29 of 50 is 69% done. @ 0x135bcb6 kudu::tablet::WriteOp::Prepare() @ 0x13514ac kudu::tablet::OpDriver::Prepare() @ 0x13520ed _ZNSt17_Function_handlerIFvvEZN4kudu6tablet8OpDriver12ExecuteAsyncEvEUlvE_E9_M_invokeERKSt9_Any_data @ 0x2e2409e kudu::ThreadPool::DispatchThread() @ 0x2e1b2c5 kudu::Thread::SuperviseThread() @ 0x3f854079d1 start_thread @ 0x3f850e88fd clone {code} This runs as a part of a suite of several tests in {{scripts/benchmarks.sh}}. This started happening fairly consistently starting around commit 2943aa701ee092158c2084c614a91f92513ef7c4, when we bumped glog and gperftools, though I'm not sure they are directly related here. The attached logs are a run on CentOS 6.6, with around 100GB of memory. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2726) Very large tablets defeat budgeted compaction
[ https://issues.apache.org/jira/browse/KUDU-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248493#comment-17248493 ] Andrew Wong commented on KUDU-2726: --- I'm a bit hesitant to entirely move consideration of the maintenance adjustments into stage 1 – it seems like these are used for prioritizing the ops that would have been done, rather than defining whether or not an op is worth performing. With that distinction, we should try to introduce a solution that tackles the latter without affecting the former. That said, I wouldn't be against introducing further improvements to stage 1. Introducing some manually-defined value similar to {{maintenance_priority}} and {{maintenance_op_multiplier}} sound like an OK solution in that some users may already be familiar the existing multipliers. I'm not personally a fan of it because picking correct values for these seems configurations unintuitive, but I know there are Kudu users who do find this configuration effective. Another solution would be to have stage 1 also account for the size of a tablet: if a tablet is very large, increase the compaction performance score. An observation here is that compacting 128MiB worth of data in a single 50GiB tablet may result in a compaction perf score of below 0.01, despite the average rowset height being relatively high. If instead we imagined the tablet were actually two 25GiB tablets, a 128MiB compaction may result in a higher perf score. Based on this observation, rather than running the budgeted compaction policy against the entire tablet, we could run it on multiple subsets of the tablet. For instance, if we have a 50GiB tablet, define some window W=25GiB such that before running the compaction scoring/selection, if the tablet is over size W, we split the input rowsets into 50/W = 2 separate sets of rowsets, run the compaction scoring/selection algorithm on both of these sets, and pick the best perf scores among the sets. This would mean {{compaction_minimum_improvement}} would no longer apply to the entire tablet, but rather it would apply to W-sized chunks of the tablet. If going down the route I'm describing, there needs to be more thought given to ensuring this doesn't introduce some never-ending compaction loop, but I think the solution is a somewhat elegant workaround for the fact that Kudu doesn't support tablet splits today. > Very large tablets defeat budgeted compaction > - > > Key: KUDU-2726 > URL: https://issues.apache.org/jira/browse/KUDU-2726 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: William Berkeley >Priority: Major > Labels: density, roadmap-candidate > > On very large tablets (50GB+), despite being very uncompacted with a large > average rowset height, a default budget (128MB) worth of compaction may not > reduce average rowset height enough to pass the minimum threshold. Thus the > tablet stays uncompacted forever. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-613) Scan-resistant cache replacement algorithm for the block cache
[ https://issues.apache.org/jira/browse/KUDU-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247540#comment-17247540 ] Andrew Wong commented on KUDU-613: -- The Apache Impala project has pulled in Kudu's block cache implementation and [extended it with LIRS|https://gerrit.cloudera.org/c/15306/]. It's probably worth pulling those bits in and seeing how they fare against contentious large-scan workloads in Kudu. LIRS: [http://web.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-02-6.pdf] > Scan-resistant cache replacement algorithm for the block cache > -- > > Key: KUDU-613 > URL: https://issues.apache.org/jira/browse/KUDU-613 > Project: Kudu > Issue Type: Improvement > Components: perf >Affects Versions: M4.5 >Reporter: Andrew Wang >Priority: Major > Labels: roadmap-candidate > > The block cache currently uses LRU, which is vulnerable to large scan > workloads. It'd be good to implement something like 2Q. > ARC (patent encumbered, but good for ideas): > https://www.usenix.org/conference/fast-03/arc-self-tuning-low-overhead-replacement-cache > HBase (2Q like): > https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3220) Pre-commit appears to drop files sometimes
Andrew Wong created KUDU-3220: - Summary: Pre-commit appears to drop files sometimes Key: KUDU-3220 URL: https://issues.apache.org/jira/browse/KUDU-3220 Project: Kudu Issue Type: Bug Components: project-infra Reporter: Andrew Wong Attachments: consoleText.txt I had a DEBUG precommit job fail because some built artifacts can't be found, even though they were just built. {code:java} [ 35%] Building CXX object src/kudu/rpc/CMakeFiles/krpc_exported.dir/negotiation.cc.o [ 35%] Building CXX object src/kudu/security/CMakeFiles/security_test_util.dir/test/test_certs.cc.o [ 35%] Building CXX object src/kudu/fs/CMakeFiles/kudu_fs.dir/fs_report.cc.o ... c++: error: CMakeFiles/kudu_fs.dir/dir_util.cc.o: No such file or directory c++: error: CMakeFiles/kudu_fs.dir/error_manager.cc.o: No such file or directory c++: error: CMakeFiles/kudu_fs.dir/file_block_manager.cc.o: No such file or directory c++: error: CMakeFiles/kudu_fs.dir/fs_manager.cc.o: No such file or directory c++: error: CMakeFiles/kudu_fs.dir/fs_report.cc.o: No such file or directory {code} There was another DEBUG build running concurrently, but it appeared to land in a different workspace. I've retriggered the job and I don't expect it's related to my patch, but I'm opening this ticket in case others see similar issues on the pre-commit infra. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3108) Tablet server crashes when handle diffscan request
[ https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3108. --- Fix Version/s: 1.14.0 Resolution: Fixed > Tablet server crashes when handle diffscan request > --- > > Key: KUDU-3108 > URL: https://issues.apache.org/jira/browse/KUDU-3108 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: YifanZhang >Priority: Major > Fix For: 1.14.0 > > > When we did an incremental backup for tables in a cluster with 20 tservers, > 3 tservers crashed, coredump stacks are the same: > {code:java} > Unable to find source-code formatter for language: shell. Available languages > are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, > groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, > perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, > yamlProgram terminated with signal 11, Segmentation fault.Program terminated > with signal 11, Segmentation fault. > #0 kudu::Schema::Compare > (this=0x25b883680, lhs=..., rhs=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267 > 267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file > or directory. > Missing separate debuginfos, use: debuginfo-install > bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 > cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 > cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 elfutils-libelf-0.166-2.el7.x86_64 > elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 > keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 > libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 > libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 > libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64 > ncurses-libs-5.9-13.20130511.el7.x86_64 > nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 > openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 > systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64 > zlib-1.2.7-17.el7.x86_64 > (gdb) bt > #0 kudu::Schema::Compare > (this=0x25b883680, lhs=..., rhs=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267 > #1 0x01da51fb in kudu::MergeIterator::RefillHotHeap > (this=this@entry=0x78f6ec500) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:720 > #2 0x01da622b in kudu::MergeIterator::AdvanceAndReheap > (this=this@entry=0x78f6ec500, state=0xd1661a000, > num_rows_to_advance=num_rows_to_advance@entry=1) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:690 > #3 0x01da7927 in kudu::MergeIterator::MaterializeOneRow > (this=this@entry=0x78f6ec500, dst=dst@entry=0x7f0d5cc9ffc0, > dst_row_idx=dst_row_idx@entry=0x7f0d5cc9fbb0) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:894 > #4 0x01da7de3 in kudu::MergeIterator::NextBlock (this=0x78f6ec500, > dst=0x7f0d5cc9ffc0) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:796 > #5 0x00a9ff19 in kudu::tablet::Tablet::Iterator::NextBlock > (this=, dst=) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/tablet.cc:2499 > #6 0x0095475c in > kudu::tserver::TabletServiceImpl::HandleContinueScanRequest > (this=this@entry=0x53b5a90, req=req@entry=0x7f0d5cca0720, > rpc_context=rpc_context@entry=0x5e512a460, > result_collector=result_collector@entry=0x7f0d5cca0a00, > has_more_results=has_more_results@entry=0x7f0d5cca0886, > error_code=error_code@entry=0x7f0d5cca0888) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2565 > #7 0x00966564 in > kudu::tserver::TabletServiceImpl::HandleNewScanRequest > (this=this@entry=0x53b5a90, replica=0xf5c0189c0, req=req@entry=0x2a15c240, > rpc_context=rpc_context@entry=0x5e512a460, > result_collector=result_collector@entry=0x7f0d5cca0a00, > scanner_id=scanner_id@entry=0x7f0d5cca0940, > snap_timestamp=snap_timestamp@entry=0x7f0d5cca0950, > has_more_results=has_more_results@entry=0x7f0d5cca0886, > error_code=error_code@entry=0x7f0d5cca0888) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2476 > #8 0x00967f4b in kudu::tserver::TabletServiceImpl::Scan > (this=0x53b5a90, req=0x2a15c240, resp=0x56f9be6c0, context=0x5e512a460) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1674 > #9 0x01d2e449 in operator() (__args#2=0x5e512a460, > __args#1=0x56f9be6c0, __args#0=, this=0x497ecdd8) at > /usr/include/c++/4.8.2/functional:2471 > #10 kudu::rpc::GeneratedServiceIf::Handle (this=0x53b5a90, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238400#comment-17238400 ] Andrew Wong commented on KUDU-2038: --- There is a patch for bitmap indexing out, but I don't think it is being actively worked on right now: https://gerrit.cloudera.org/c/11722/. It is something that I have wanted to revisit, but haven't had the time to prioritize recently. KUDU-3033 is another ticket that I think would be really helpful for reducing IO for selective predicates, but again I'm unaware of anyone working on it. If you're interested in picking up either feature, I'd be happy to help design and review. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Task >Reporter: Yi Guolei >Priority: Major > Labels: roadmap-candidate > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KUDU-3108) Tablet server crashes when handle diffscan request
[ https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223996#comment-17223996 ] Andrew Wong edited comment on KUDU-3108 at 11/21/20, 8:24 AM: -- I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this crash with the following sequence. {code:java} TEST_F(FuzzTest, Kudu3108) { CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema()); RunFuzzCase({ {TEST_INSERT, 1}, {TEST_FLUSH_OPS}, {TEST_FLUSH_TABLET}, {TEST_INSERT_IGNORE_PK_ONLY, 3}, {TEST_DELETE, 1}, {TEST_FLUSH_OPS}, {TEST_FLUSH_TABLET}, {TEST_UPSERT, 3}, {TEST_UPSERT_PK_ONLY, 1}, {TEST_INSERT, 0}, {TEST_FLUSH_OPS}, {TEST_UPDATE_IGNORE, 0}, {TEST_UPDATE, 3}, {TEST_FLUSH_OPS}, {TEST_DIFF_SCAN, 5, 15}, }); } {code} This results in the following crash: {code:java} F1030 21:16:58.411253 40800 schema.h:706] Check failed: KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema()) *** Check failure stack trace: *** *** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are using GNU date *** PC: @ 0x7f701fcf11d7 __GI_raise *** SIGABRT (@0x11179efd) received by PID 40701 (TID 0x7f6ff0f47700) from PID 40701; stack trace: *** @ 0x7f7026a70370 (unknown) @ 0x7f701fcf11d7 __GI_raise @ 0x7f701fcf28c8 __GI_abort @ 0x7f70224377b9 google::logging_fail() @ 0x7f7022438f8d google::LogMessage::Fail() @ 0x7f702243aee3 google::LogMessage::SendToLog() @ 0x7f7022438ae9 google::LogMessage::Flush() @ 0x7f702243b86f google::LogMessageFatal::~LogMessageFatal() @ 0x7f702cc99fbc kudu::Schema::Compare<>() @ 0x7f7026167cfd kudu::MergeIterator::RefillHotHeap() @ 0x7f7026167357 kudu::MergeIterator::AdvanceAndReheap() @ 0x7f7026169617 kudu::MergeIterator::MaterializeOneRow() @ 0x7f70261688e9 kudu::MergeIterator::NextBlock() @ 0x7f702cbddd9b kudu::tablet::Tablet::Iterator::NextBlock() @ 0x7f70317bcab3 kudu::tserver::TabletServiceImpl::HandleContinueScanRequest() @ 0x7f70317bb857 kudu::tserver::TabletServiceImpl::HandleNewScanRequest() @ 0x7f70317b464e kudu::tserver::TabletServiceImpl::Scan() @ 0x7f702ddfd762 _ZZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_ @ 0x7f702de0064d _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ @ 0x7f702b4ddcc2 std::function<>::operator()() @ 0x7f702b4dd6ed kudu::rpc::GeneratedServiceIf::Handle() @ 0x7f702b4dfff8 kudu::rpc::ServicePool::RunThread() @ 0x7f702b4de8c5 _ZZN4kudu3rpc11ServicePool4InitEiENKUlvE_clEv @ 0x7f702b4e0337 _ZNSt17_Function_handlerIFvvEZN4kudu3rpc11ServicePool4InitEiEUlvE_E9_M_invokeERKSt9_Any_data @ 0x7f7033524b9c std::function<>::operator()() @ 0x7f70248227e0 kudu::Thread::SuperviseThread() @ 0x7f7026a68dc5 start_thread @ 0x7f701fdb376d __clone Aborted {code} I haven't fully grokked this sequence, but I will look into this in the coming days. was (Author: andrew.wong): I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this crash with the following sequence (ignore the \{{-1}}s – their functionality is not committed yet). {code:java} TEST_F(FuzzTest, Kudu3108) { CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema()); RunFuzzCase({ {TEST_INSERT, 1, -1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_INSERT_IGNORE, 3, -1}, {TEST_DELETE, 1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_UPSERT, 3}, {TEST_UPSERT_PK_ONLY, 1}, {TEST_INSERT, 0, -1}, {TEST_FLUSH_OPS, -1}, {TEST_UPDATE_IGNORE, 0}, {TEST_UPDATE, 3}, {TEST_FLUSH_OPS, -1}, {TEST_DIFF_SCAN, 5, 15}, }); } {code} This results in the following crash: {code:java} F1030 21:16:58.411253 40800 schema.h:706] Check failed: KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema()) *** Check failure stack trace: *** *** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are using GNU date *** PC: @ 0x7f701fcf11d7 __GI_raise *** SIGABRT (@0x11179efd) received by PID 40701 (TID 0x7f6ff0f47700) from PID 40701; stack trace: *** @ 0x7f7026a70370 (unknown) @ 0x7f701fcf11d7 __GI_raise @ 0x7f701fcf28c8 __GI_abort @ 0x7f70224377b9 google::logging_fail() @ 0x7f7022438f8d google::LogMessage::Fail() @ 0x7f702243aee3 google::LogMessage::SendToLog() @ 0x7f7022438ae9 google::LogMessage::Flush() @ 0x7f702243b86f
[jira] [Created] (KUDU-3213) Java client should attempt a different tablet server when retrying during tserver quiescing
Andrew Wong created KUDU-3213: - Summary: Java client should attempt a different tablet server when retrying during tserver quiescing Key: KUDU-3213 URL: https://issues.apache.org/jira/browse/KUDU-3213 Project: Kudu Issue Type: Bug Components: java Reporter: Andrew Wong One of our clusters ran into the following error message when leaving a tablet server quiesced for an extended period of time: {code:java} ERROR Runner: Pipeline exception occurred: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 6.0 failed 4 times, most recent failure: Lost task 1.3 in stage 6.0 (TID 1922, tserver-018.edh.company.com, executor 58): org.apache.kudu.client.NonRecoverableException: cannot complete before timeout: ScanRequest(scannerId=null, tablet=9e17b554f85f4a7f855771d8b5c913f5, attempt=24, KuduRpc(method=Scan, tablet=9e17b554f85f4a7f855771d8b5c913f5, attempt=24, DeadlineTracker(timeout=3, elapsed=27988), Traces: [0ms] refreshing cache from master, [1ms] Sub RPC GetTableLocations: sending RPC to server master-name-003.edh.company.com:7051, [12ms] Sub RPC GetTableLocations: received response from server master-name-003.edh.company.com:7051: OK, [22ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [116ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [117ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [126ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [129ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [129ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [146ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [149ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [149ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [166ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [168ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [168ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [206ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [209ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [209ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [266ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [268ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [268ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [306ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [308ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [308ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [545ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [548ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [548ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [865ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [868ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [868ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [1266ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [1269ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [1269ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [2626ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [2628ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [2628ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [4746ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [4749ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [4749ms] received response from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet server is quiescing (error 0), [8206ms] sending RPC to server e1a4405443d845249b5ed15c8e882211, [8209ms] delaying RPC due to: Service unavailable: Tablet server is quiescing (error 0), [8209ms] received response from server e1a4405443d845249b5ed15c8e882
[jira] [Comment Edited] (KUDU-3108) Tablet server crashes when handle diffscan request
[ https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223997#comment-17223997 ] Andrew Wong edited comment on KUDU-3108 at 10/31/20, 4:35 AM: -- Still not an explanation, but if I change the last row to {{\{TEST_DIFF_SCAN, 5, 10\}}} the test doesn't pass, but it at least doesn't crash. was (Author: andrew.wong): Still not an explanation, but if I change the last row to {{{TEST_DIFF_SCAN, 5, 10}}} the test doesn't pass, but it at least doesn't crash. > Tablet server crashes when handle diffscan request > --- > > Key: KUDU-3108 > URL: https://issues.apache.org/jira/browse/KUDU-3108 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: YifanZhang >Priority: Major > > When we did an incremental backup for tables in a cluster with 20 tservers, > 3 tservers crashed, coredump stacks are the same: > {code:java} > Unable to find source-code formatter for language: shell. Available languages > are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, > groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, > perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, > yamlProgram terminated with signal 11, Segmentation fault.Program terminated > with signal 11, Segmentation fault. > #0 kudu::Schema::Compare > (this=0x25b883680, lhs=..., rhs=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267 > 267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file > or directory. > Missing separate debuginfos, use: debuginfo-install > bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 > cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 > cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 elfutils-libelf-0.166-2.el7.x86_64 > elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 > keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 > libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 > libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 > libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64 > ncurses-libs-5.9-13.20130511.el7.x86_64 > nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 > openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 > systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64 > zlib-1.2.7-17.el7.x86_64 > (gdb) bt > #0 kudu::Schema::Compare > (this=0x25b883680, lhs=..., rhs=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267 > #1 0x01da51fb in kudu::MergeIterator::RefillHotHeap > (this=this@entry=0x78f6ec500) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:720 > #2 0x01da622b in kudu::MergeIterator::AdvanceAndReheap > (this=this@entry=0x78f6ec500, state=0xd1661a000, > num_rows_to_advance=num_rows_to_advance@entry=1) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:690 > #3 0x01da7927 in kudu::MergeIterator::MaterializeOneRow > (this=this@entry=0x78f6ec500, dst=dst@entry=0x7f0d5cc9ffc0, > dst_row_idx=dst_row_idx@entry=0x7f0d5cc9fbb0) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:894 > #4 0x01da7de3 in kudu::MergeIterator::NextBlock (this=0x78f6ec500, > dst=0x7f0d5cc9ffc0) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:796 > #5 0x00a9ff19 in kudu::tablet::Tablet::Iterator::NextBlock > (this=, dst=) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/tablet.cc:2499 > #6 0x0095475c in > kudu::tserver::TabletServiceImpl::HandleContinueScanRequest > (this=this@entry=0x53b5a90, req=req@entry=0x7f0d5cca0720, > rpc_context=rpc_context@entry=0x5e512a460, > result_collector=result_collector@entry=0x7f0d5cca0a00, > has_more_results=has_more_results@entry=0x7f0d5cca0886, > error_code=error_code@entry=0x7f0d5cca0888) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2565 > #7 0x00966564 in > kudu::tserver::TabletServiceImpl::HandleNewScanRequest > (this=this@entry=0x53b5a90, replica=0xf5c0189c0, req=req@entry=0x2a15c240, > rpc_context=rpc_context@entry=0x5e512a460, > result_collector=result_collector@entry=0x7f0d5cca0a00, > scanner_id=scanner_id@entry=0x7f0d5cca0940, > snap_timestamp=snap_timestamp@entry=0x7f0d5cca0950, > has_more_results=has_more_results@entry=0x7f0d5cca0886, > error_code=error_code@entry=0x7f0d5cca0888) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2476 > #8 0x00967f4b in kudu::tserver::TabletServiceImpl::Scan > (this=0x53b5a90, req=0x2a15c240, resp=0x56f9be6c0, context=0x5e512a460) at > /home/zhangyifan8/work/
[jira] [Commented] (KUDU-3108) Tablet server crashes when handle diffscan request
[ https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223997#comment-17223997 ] Andrew Wong commented on KUDU-3108: --- Still not an explanation, but if I change the last row to {{{TEST_DIFF_SCAN, 5, 10}}} the test doesn't pass, but it at least doesn't crash. > Tablet server crashes when handle diffscan request > --- > > Key: KUDU-3108 > URL: https://issues.apache.org/jira/browse/KUDU-3108 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: YifanZhang >Priority: Major > > When we did an incremental backup for tables in a cluster with 20 tservers, > 3 tservers crashed, coredump stacks are the same: > {code:java} > Unable to find source-code formatter for language: shell. Available languages > are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, > groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, > perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, > yamlProgram terminated with signal 11, Segmentation fault.Program terminated > with signal 11, Segmentation fault. > #0 kudu::Schema::Compare > (this=0x25b883680, lhs=..., rhs=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267 > 267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file > or directory. > Missing separate debuginfos, use: debuginfo-install > bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 > cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 > cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 elfutils-libelf-0.166-2.el7.x86_64 > elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 > keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 > libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 > libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 > libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64 > ncurses-libs-5.9-13.20130511.el7.x86_64 > nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 > openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 > systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64 > zlib-1.2.7-17.el7.x86_64 > (gdb) bt > #0 kudu::Schema::Compare > (this=0x25b883680, lhs=..., rhs=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267 > #1 0x01da51fb in kudu::MergeIterator::RefillHotHeap > (this=this@entry=0x78f6ec500) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:720 > #2 0x01da622b in kudu::MergeIterator::AdvanceAndReheap > (this=this@entry=0x78f6ec500, state=0xd1661a000, > num_rows_to_advance=num_rows_to_advance@entry=1) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:690 > #3 0x01da7927 in kudu::MergeIterator::MaterializeOneRow > (this=this@entry=0x78f6ec500, dst=dst@entry=0x7f0d5cc9ffc0, > dst_row_idx=dst_row_idx@entry=0x7f0d5cc9fbb0) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:894 > #4 0x01da7de3 in kudu::MergeIterator::NextBlock (this=0x78f6ec500, > dst=0x7f0d5cc9ffc0) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:796 > #5 0x00a9ff19 in kudu::tablet::Tablet::Iterator::NextBlock > (this=, dst=) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/tablet.cc:2499 > #6 0x0095475c in > kudu::tserver::TabletServiceImpl::HandleContinueScanRequest > (this=this@entry=0x53b5a90, req=req@entry=0x7f0d5cca0720, > rpc_context=rpc_context@entry=0x5e512a460, > result_collector=result_collector@entry=0x7f0d5cca0a00, > has_more_results=has_more_results@entry=0x7f0d5cca0886, > error_code=error_code@entry=0x7f0d5cca0888) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2565 > #7 0x00966564 in > kudu::tserver::TabletServiceImpl::HandleNewScanRequest > (this=this@entry=0x53b5a90, replica=0xf5c0189c0, req=req@entry=0x2a15c240, > rpc_context=rpc_context@entry=0x5e512a460, > result_collector=result_collector@entry=0x7f0d5cca0a00, > scanner_id=scanner_id@entry=0x7f0d5cca0940, > snap_timestamp=snap_timestamp@entry=0x7f0d5cca0950, > has_more_results=has_more_results@entry=0x7f0d5cca0886, > error_code=error_code@entry=0x7f0d5cca0888) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2476 > #8 0x00967f4b in kudu::tserver::TabletServiceImpl::Scan > (this=0x53b5a90, req=0x2a15c240, resp=0x56f9be6c0, context=0x5e512a460) at > /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1674 > #9 0x01d2e449 in operator() (__args#2=0x5e512a460, > __args#1=0x56f9be6c0, __args#0=, this=0x497ecdd8) at > /usr/include/c++/4.8.2/functional:2471 > #10 kudu::rpc::
[jira] [Comment Edited] (KUDU-3108) Tablet server crashes when handle diffscan request
[ https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223996#comment-17223996 ] Andrew Wong edited comment on KUDU-3108 at 10/31/20, 4:30 AM: -- I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this crash with the following sequence (ignore the \{{-1}}s – their functionality is not committed yet). {code:java} TEST_F(FuzzTest, Kudu3108) { CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema()); RunFuzzCase({ {TEST_INSERT, 1, -1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_INSERT_IGNORE, 3, -1}, {TEST_DELETE, 1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_UPSERT, 3}, {TEST_UPSERT_PK_ONLY, 1}, {TEST_INSERT, 0, -1}, {TEST_FLUSH_OPS, -1}, {TEST_UPDATE_IGNORE, 0}, {TEST_UPDATE, 3}, {TEST_FLUSH_OPS, -1}, {TEST_DIFF_SCAN, 5, 15}, }); } {code} This results in the following crash: {code:java} F1030 21:16:58.411253 40800 schema.h:706] Check failed: KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema()) *** Check failure stack trace: *** *** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are using GNU date *** PC: @ 0x7f701fcf11d7 __GI_raise *** SIGABRT (@0x11179efd) received by PID 40701 (TID 0x7f6ff0f47700) from PID 40701; stack trace: *** @ 0x7f7026a70370 (unknown) @ 0x7f701fcf11d7 __GI_raise @ 0x7f701fcf28c8 __GI_abort @ 0x7f70224377b9 google::logging_fail() @ 0x7f7022438f8d google::LogMessage::Fail() @ 0x7f702243aee3 google::LogMessage::SendToLog() @ 0x7f7022438ae9 google::LogMessage::Flush() @ 0x7f702243b86f google::LogMessageFatal::~LogMessageFatal() @ 0x7f702cc99fbc kudu::Schema::Compare<>() @ 0x7f7026167cfd kudu::MergeIterator::RefillHotHeap() @ 0x7f7026167357 kudu::MergeIterator::AdvanceAndReheap() @ 0x7f7026169617 kudu::MergeIterator::MaterializeOneRow() @ 0x7f70261688e9 kudu::MergeIterator::NextBlock() @ 0x7f702cbddd9b kudu::tablet::Tablet::Iterator::NextBlock() @ 0x7f70317bcab3 kudu::tserver::TabletServiceImpl::HandleContinueScanRequest() @ 0x7f70317bb857 kudu::tserver::TabletServiceImpl::HandleNewScanRequest() @ 0x7f70317b464e kudu::tserver::TabletServiceImpl::Scan() @ 0x7f702ddfd762 _ZZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_ @ 0x7f702de0064d _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ @ 0x7f702b4ddcc2 std::function<>::operator()() @ 0x7f702b4dd6ed kudu::rpc::GeneratedServiceIf::Handle() @ 0x7f702b4dfff8 kudu::rpc::ServicePool::RunThread() @ 0x7f702b4de8c5 _ZZN4kudu3rpc11ServicePool4InitEiENKUlvE_clEv @ 0x7f702b4e0337 _ZNSt17_Function_handlerIFvvEZN4kudu3rpc11ServicePool4InitEiEUlvE_E9_M_invokeERKSt9_Any_data @ 0x7f7033524b9c std::function<>::operator()() @ 0x7f70248227e0 kudu::Thread::SuperviseThread() @ 0x7f7026a68dc5 start_thread @ 0x7f701fdb376d __clone Aborted {code} I haven't fully grokked this sequence, but I will look into this in the coming days. was (Author: andrew.wong): I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this crash with the following sequence (ignore the \{{-1}}s – their functionality is not committed yet). {code:java} TEST_F(FuzzTest, Kudu3108) { CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema()); RunFuzzCase({ {TEST_INSERT, 1, -1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_INSERT_IGNORE, 3, -1}, {TEST_DELETE, 1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_UPSERT, 3}, {TEST_UPSERT_PK_ONLY, 1}, {TEST_INSERT, 0, -1}, {TEST_FLUSH_OPS, -1}, {TEST_UPDATE_IGNORE, 0}, {TEST_UPDATE, 3}, {TEST_FLUSH_OPS, -1}, {TEST_UPSERT_PK_ONLY, 1}, {TEST_UPSERT, 3}, {TEST_INSERT, 2, -1}, {TEST_DIFF_SCAN, 5, 15}, }); } {code} This results in the following crash: {code:java} F1030 21:16:58.411253 40800 schema.h:706] Check failed: KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema()) *** Check failure stack trace: *** *** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are using GNU date *** PC: @ 0x7f701fcf11d7 __GI_raise *** SIGABRT (@0x11179efd) received by PID 40701 (TID 0x7f6ff0f47700) from PID 40701; stack trace: *** @ 0x7f7026a70370 (unknown) @ 0x7f701fcf11d7 __GI_raise @ 0x7f701fcf28c8 __GI_abort @ 0x7f70224377b9 google::logging_fail() @ 0x7f7022438f
[jira] [Commented] (KUDU-3108) Tablet server crashes when handle diffscan request
[ https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223996#comment-17223996 ] Andrew Wong commented on KUDU-3108: --- I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this crash with the following sequence (ignore the \{{-1}}s – their functionality is not committed yet). {code:java} TEST_F(FuzzTest, Kudu3108) { CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema()); RunFuzzCase({ {TEST_INSERT, 1, -1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_INSERT_IGNORE, 3, -1}, {TEST_DELETE, 1}, {TEST_FLUSH_OPS, -1}, {TEST_FLUSH_TABLET}, {TEST_UPSERT, 3}, {TEST_UPSERT_PK_ONLY, 1}, {TEST_INSERT, 0, -1}, {TEST_FLUSH_OPS, -1}, {TEST_UPDATE_IGNORE, 0}, {TEST_UPDATE, 3}, {TEST_FLUSH_OPS, -1}, {TEST_UPSERT_PK_ONLY, 1}, {TEST_UPSERT, 3}, {TEST_INSERT, 2, -1}, {TEST_DIFF_SCAN, 5, 15}, }); } {code} This results in the following crash: {code:java} F1030 21:16:58.411253 40800 schema.h:706] Check failed: KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema()) *** Check failure stack trace: *** *** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are using GNU date *** PC: @ 0x7f701fcf11d7 __GI_raise *** SIGABRT (@0x11179efd) received by PID 40701 (TID 0x7f6ff0f47700) from PID 40701; stack trace: *** @ 0x7f7026a70370 (unknown) @ 0x7f701fcf11d7 __GI_raise @ 0x7f701fcf28c8 __GI_abort @ 0x7f70224377b9 google::logging_fail() @ 0x7f7022438f8d google::LogMessage::Fail() @ 0x7f702243aee3 google::LogMessage::SendToLog() @ 0x7f7022438ae9 google::LogMessage::Flush() @ 0x7f702243b86f google::LogMessageFatal::~LogMessageFatal() @ 0x7f702cc99fbc kudu::Schema::Compare<>() @ 0x7f7026167cfd kudu::MergeIterator::RefillHotHeap() @ 0x7f7026167357 kudu::MergeIterator::AdvanceAndReheap() @ 0x7f7026169617 kudu::MergeIterator::MaterializeOneRow() @ 0x7f70261688e9 kudu::MergeIterator::NextBlock() @ 0x7f702cbddd9b kudu::tablet::Tablet::Iterator::NextBlock() @ 0x7f70317bcab3 kudu::tserver::TabletServiceImpl::HandleContinueScanRequest() @ 0x7f70317bb857 kudu::tserver::TabletServiceImpl::HandleNewScanRequest() @ 0x7f70317b464e kudu::tserver::TabletServiceImpl::Scan() @ 0x7f702ddfd762 _ZZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_ @ 0x7f702de0064d _ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_ @ 0x7f702b4ddcc2 std::function<>::operator()() @ 0x7f702b4dd6ed kudu::rpc::GeneratedServiceIf::Handle() @ 0x7f702b4dfff8 kudu::rpc::ServicePool::RunThread() @ 0x7f702b4de8c5 _ZZN4kudu3rpc11ServicePool4InitEiENKUlvE_clEv @ 0x7f702b4e0337 _ZNSt17_Function_handlerIFvvEZN4kudu3rpc11ServicePool4InitEiEUlvE_E9_M_invokeERKSt9_Any_data @ 0x7f7033524b9c std::function<>::operator()() @ 0x7f70248227e0 kudu::Thread::SuperviseThread() @ 0x7f7026a68dc5 start_thread @ 0x7f701fdb376d __clone Aborted {code} I haven't fully grokked this sequence, but I will look into this in the coming days. > Tablet server crashes when handle diffscan request > --- > > Key: KUDU-3108 > URL: https://issues.apache.org/jira/browse/KUDU-3108 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: YifanZhang >Priority: Major > > When we did an incremental backup for tables in a cluster with 20 tservers, > 3 tservers crashed, coredump stacks are the same: > {code:java} > Unable to find source-code formatter for language: shell. Available languages > are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, > groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, > perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, > yamlProgram terminated with signal 11, Segmentation fault.Program terminated > with signal 11, Segmentation fault. > #0 kudu::Schema::Compare > (this=0x25b883680, lhs=..., rhs=...) at > /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267 > 267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file > or directory. > Missing separate debuginfos, use: debuginfo-install > bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 > cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 > cyrus-sasl-plai
[jira] [Created] (KUDU-3209) Allow decommissioning tool to run without rebalancing the rest of the cluster
Andrew Wong created KUDU-3209: - Summary: Allow decommissioning tool to run without rebalancing the rest of the cluster Key: KUDU-3209 URL: https://issues.apache.org/jira/browse/KUDU-3209 Project: Kudu Issue Type: Improvement Components: CLI, ops-tooling Reporter: Andrew Wong Currently when specifying {{--move_replicas_from_ignored_tservers}} to the rebalancer tool, the tool first empties the ignored tablet servers, and then runs rebalancing of the rest of the cluster. While true to its name as the "rebalancer tool", this tight coupling isn't always desired, especially given how heavy-weight a full cluster rebalancing can be upon first usage. It'd be nice if users could specify some {{--empty_ignored_tservers_only}} flag that made no attempt at further rebalancing once decommissioning completes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3204) Add a metric for amount of available space
Andrew Wong created KUDU-3204: - Summary: Add a metric for amount of available space Key: KUDU-3204 URL: https://issues.apache.org/jira/browse/KUDU-3204 Project: Kudu Issue Type: Improvement Components: fs, metrics Reporter: Andrew Wong It'd be convenient to expose a metric for how much space there is available on each tablet server (accounting for {{fs_wal_dir_reserved_bytes}} and {{fs_data_dirs_reserved_bytes}}). This would be useful in implementing a replica placement policy based on available space. It's probably worth separating metrics for available WAL directory space and available data directory space. E.g. we may want to treat a lack of space differently depending on whether we are limited on WAL space and not limited on data space, and vice versa. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3149) Lock contention between registering ops and computing maintenance op stats
[ https://issues.apache.org/jira/browse/KUDU-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3149: -- Fix Version/s: 1.14.0 Resolution: Fixed Status: Resolved (was: In Review) > Lock contention between registering ops and computing maintenance op stats > -- > > Key: KUDU-3149 > URL: https://issues.apache.org/jira/browse/KUDU-3149 > Project: Kudu > Issue Type: Bug > Components: perf, tserver >Reporter: Andrew Wong >Priority: Critical > Fix For: 1.14.0 > > > We saw a bunch of tablets bootstrapping extremely slowly, and many stuck > supposedly bootstrapping, but not showing up in the {{/tablets}} page, i.e. > we could only see INITIALIZED and RUNNING tablets, no BOOTSTRAPPING. > Upon digging into the stacks, we saw a bunch waiting to acquire the MM lock: > {code:java} > TID 46577(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb59b99 kudu::tablet::Tablet::RegisterMaintenanceOps() > @ 0xb855a1 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > TID 46574(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb59c74 kudu::tablet::Tablet::RegisterMaintenanceOps() > @ 0xb855a1 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > 7 threads with same stack: > TID 46575(tablet-open [wo): > TID 46576(tablet-open [wo): > TID 46578(tablet-open [wo): > TID 46580(tablet-open [wo): > TID 46581(tablet-open [wo): > TID 46582(tablet-open [wo): > TID 46583(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb85374 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > TID 46573(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb854c7 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > 2 threads with same stack: > TID 43795(MaintenanceMgr ): > TID 43796(MaintenanceMgr ): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x239a064 kudu::MaintenanceManager::LaunchOp() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > {code} > A couple more stacks show some work being done by the maintenance manager: > {code:java} > TID 43794(MaintenanceMgr ): > @ 0x7f1dd57147e0 (unknown) > @ 0xba7b41 > kudu::tablet::BudgetedCompactionPolicy::RunApproximation() > @ 0xba8f5d > kudu::tablet::BudgetedCompactionPolicy::PickRowSets() > @ 0xb5b1a1 kudu::tablet::Tablet::Pic
[jira] [Assigned] (KUDU-3149) Lock contention between registering ops and computing maintenance op stats
[ https://issues.apache.org/jira/browse/KUDU-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-3149: - Assignee: Andrew Wong > Lock contention between registering ops and computing maintenance op stats > -- > > Key: KUDU-3149 > URL: https://issues.apache.org/jira/browse/KUDU-3149 > Project: Kudu > Issue Type: Bug > Components: perf, tserver >Reporter: Andrew Wong >Assignee: Andrew Wong >Priority: Critical > Fix For: 1.14.0 > > > We saw a bunch of tablets bootstrapping extremely slowly, and many stuck > supposedly bootstrapping, but not showing up in the {{/tablets}} page, i.e. > we could only see INITIALIZED and RUNNING tablets, no BOOTSTRAPPING. > Upon digging into the stacks, we saw a bunch waiting to acquire the MM lock: > {code:java} > TID 46577(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb59b99 kudu::tablet::Tablet::RegisterMaintenanceOps() > @ 0xb855a1 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > TID 46574(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb59c74 kudu::tablet::Tablet::RegisterMaintenanceOps() > @ 0xb855a1 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > 7 threads with same stack: > TID 46575(tablet-open [wo): > TID 46576(tablet-open [wo): > TID 46578(tablet-open [wo): > TID 46580(tablet-open [wo): > TID 46581(tablet-open [wo): > TID 46582(tablet-open [wo): > TID 46583(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb85374 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > TID 46573(tablet-open [wo): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x23980ff kudu::MaintenanceManager::RegisterOp() > @ 0xb854c7 > kudu::tablet::TabletReplica::RegisterMaintenanceOps() > @ 0xa0055b kudu::tserver::TSTabletManager::OpenTablet() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > 2 threads with same stack: > TID 43795(MaintenanceMgr ): > TID 43796(MaintenanceMgr ): > @ 0x7f1dd57147e0 (unknown) > @ 0x7f1dd5713332 (unknown) > @ 0x7f1dd570e5d8 (unknown) > @ 0x7f1dd570e4a7 (unknown) > @ 0x23b4058 kudu::Mutex::Acquire() > @ 0x239a064 kudu::MaintenanceManager::LaunchOp() > @ 0x23f994c kudu::ThreadPool::DispatchThread() > @ 0x23f3f8b kudu::Thread::SuperviseThread() > @ 0x7f1dd570caa1 (unknown) > @ 0x7f1dd3b18bcd (unknown) > {code} > A couple more stacks show some work being done by the maintenance manager: > {code:java} > TID 43794(MaintenanceMgr ): > @ 0x7f1dd57147e0 (unknown) > @ 0xba7b41 > kudu::tablet::BudgetedCompactionPolicy::RunApproximation() > @ 0xba8f5d > kudu::tablet::BudgetedCompactionPolicy::PickRowSets() > @ 0xb5b1a1 kudu::tablet::Tablet::PickRowSetsToCompact() > @
[jira] [Created] (KUDU-3203) Allow clients to support reading decimals with wider bit-width
Andrew Wong created KUDU-3203: - Summary: Allow clients to support reading decimals with wider bit-width Key: KUDU-3203 URL: https://issues.apache.org/jira/browse/KUDU-3203 Project: Kudu Issue Type: Improvement Components: client Reporter: Andrew Wong Today, decimal bit-width is entirely determined by Kudu. When creating a schema of a given precision and scale, Kudu determines the correct bit-width for the parameters, and uses that to store values. Client scanners can only specify reading DECIMAL (ignorant of bit-width). In requesting the columnar layout, however, it'd be nice if client scanners could also specify the desired bit-width to get back from tservers, and have the tservers inflate values as appropriate. This would be helpful, e.g. to read DECIMAL32- and DECIMAL64-stored data in Arrow, which currently only supports DECIMAL128 and DECIMAL256. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3071) Expose splitKeyRanges in the C++ clients
[ https://issues.apache.org/jira/browse/KUDU-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3071. --- Fix Version/s: n/a Resolution: Won't Fix Impala can actually leverage the split key ranges feature already because it would (and does, following IMPALA-9792) generate tokens from its frontend. > Expose splitKeyRanges in the C++ clients > > > Key: KUDU-3071 > URL: https://issues.apache.org/jira/browse/KUDU-3071 > Project: Kudu > Issue Type: Improvement > Components: client, perf >Reporter: Andrew Wong >Priority: Major > Fix For: n/a > > > KUDU-2437 introduced the server-side ability to return "split keys" that > logically divide a given tablet into key ranges. KUDU-2670 introduced an > improvement in the Spark integration's KuduRDD to allow for Spark to use this > generate smaller-scoped scan tokens that each scan a chunk of a tablet > instead of entire tablets. This decoupled the a table's partitioning scheme > from its read concurrency limitations. > It'd be great if we could expose chunked-token-hydration in the C++ client so > that Impala can begin generating these chunked tokens and then hydrating them > into smaller scanners in its backend. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3161) Include FileSystem Path in UUID Mismatch Error
[ https://issues.apache.org/jira/browse/KUDU-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3161: -- Labels: newbie (was: ) > Include FileSystem Path in UUID Mismatch Error > -- > > Key: KUDU-3161 > URL: https://issues.apache.org/jira/browse/KUDU-3161 > Project: Kudu > Issue Type: Improvement >Reporter: David Mollitor >Priority: Minor > Labels: newbie > > {code:none} > Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: > Mismatched UUIDs across filesystem roots: 2935c5f89ee45654bfcf3bf67569f11f > vs. d7af50d73dae4fa38de386bc583cab22; configuring multiple Kudu processes > with the same directory is not supported > {code} > Please enhance this logging to dump the UUID and location of each file system > so that the problematic directory(ies) can be quickly determined. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KUDU-3197) Tablet keeps all history schemas in memory may result in high memory consumption
[ https://issues.apache.org/jira/browse/KUDU-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211340#comment-17211340 ] Andrew Wong edited comment on KUDU-3197 at 10/9/20, 8:16 PM: - You're right in that the scanner creates a schema to represent its projection. However, the underlying iterators may take references to the current schema while iterating, the tablet service might take references while preparing to scan, etc. While most if not all of these accesses are short-lived, we need to be careful not to destruct the schemas while these references still exist. Grepping around to audit current usages (with some false positives for copies and log messages): {code:java} ~/Repositories/kudu/src/kudu > grep -r "meta.*[^_]schema()" . ./tablet/tablet-schema-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet-schema-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet-schema-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet-schema-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet-schema-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet_metadata.cc:if (!(*metadata)->schema().Equals(schema)) { ./tablet/tablet_metadata.cc:"match expected schema ($1)", (*metadata)->schema().ToString(), ./tablet/diff_scan-test.cc: SchemaBuilder builder(tablet->metadata()->schema()); ./tablet/tablet_replica-test.cc: ASSERT_OK(SchemaToPB(SchemaBuilder(tablet()->metadata()->schema()).Build(), &orig_schema_pb)); ./tablet/tablet_replica-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet_replica-test.cc: ASSERT_OK(SchemaToPB(SchemaBuilder(tablet()->metadata()->schema()).Build(), &orig_schema_pb)); ./tablet/tablet_replica-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet.cc: : key_schema_(metadata->schema().CreateKeyProjection()), ./tablet/tablet.cc: metadata_->SetSchema(*op_state->schema(), op_state->schema_version()); ./tablet/tablet.cc: RollingDiskRowSetWriter drsw(metadata_.get(), merge->schema(), DefaultBloomSizing(), ./tablet/rowset_metadata.h:return tablet_metadata_->schema(); ./tablet/tablet.h:return &metadata_->schema(); ./tablet/all_types-scan-correctness-test.cc:SchemaBuilder builder(tablet()->metadata()->schema()); ./tools/kudu-tool-test.cc: .PartitionDebugString(meta->partition(), meta->schema()); ./tools/kudu-tool-test.cc: debug_str = meta->schema().ToString(); ./tools/kudu-tool-test.cc:.PartitionDebugString(meta->partition(), meta->schema()); ./tools/kudu-tool-test.cc:debug_str = meta->schema().ToString(); ./tools/tool_action_local_replica.cc:const auto& col_idx = meta->schema().find_column_by_id(col_id); ./tools/tool_action_local_replica.cc: meta->schema().column(col_idx).name() : "?"); ./tools/tool_action_local_replica.cc: Schema schema = meta->schema(); ./tools/tool_action_local_replica.cc: const Schema& schema = meta->schema(); ./tools/tool_action_local_replica.cc: meta->schema()) ./tserver/tablet_service.cc:Schema tablet_schema = replica->tablet_metadata()->schema(); ./tserver/tablet_service.cc: const auto& schema = replica->tablet_metadata()->schema(); ./tserver/tablet_service.cc: CHECK_OK(SchemaToPB(replica->tablet_metadata()->schema(), ./tserver/tablet_service.cc: const auto& schema = replica->tablet_metadata()->schema(); ./tserver/tablet_service.cc: Schema tablet_schema = replica->tablet_metadata()->schema(); ./tserver/tablet_service.cc: const auto& schema = replica->tablet_metadata()->schema(); ./tserver/tablet_service.cc: const Schema& tablet_schema = replica->tablet_metadata()->schema(); ./tserver/scanners.cc: spec().lower_bound_key()->Stringify(tablet_metadata->schema(); ./tserver/scanners.cc: spec().exclusive_upper_bound_key()->Stringify(tablet_metadata->schema(); ./tserver/tserver_path_handlers.cc: tmeta->schema()); ./tserver/tserver_path_handlers.cc: const Schema& schema = tmeta->schema(); ./master/sys_catalog.cc: if (!metadata->schema().Equals(BuildTableSchema())) { ./master/sys_catalog.cc:return(Status::Corruption("Unexpected schema", metadata->schema().ToString())); ./client/scan_token-internal.cc: RETURN_NOT_OK(SchemaFromPB(metadata.schema(), &schema)); ./client/client-test.cc:Schema schema = tablet_replica->tablet()->metadata()->schema(); ./client/client-test.cc:Schema schema = tablet_replica->tablet()->metadata()->schema(); ./client/client-test.cc:Schema schema = tablet_replica->tablet()->metadata()->schema(); ./client/client-test.cc:Schema schema = tablet_replica->tablet()->metada
[jira] [Commented] (KUDU-3197) Tablet keeps all history schemas in memory may result in high memory consumption
[ https://issues.apache.org/jira/browse/KUDU-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211340#comment-17211340 ] Andrew Wong commented on KUDU-3197: --- You're right in that the scanner creates a schema to represent its projection. However, the underlying iterators may take references to the current schema while iterating, the tablet service might take references while preparing to scan, etc. While most if not all of these accesses are short-lived, we need to be careful not to destruct the schemas while these references still exist. Grepping around to audit current usages (with some false positives for copies and log messages): {code:java} ~/Repositories/kudu/src/kudu > grep -r "meta.*[^_]schema()" . ./tablet/tablet-schema-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet-schema-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet-schema-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet-schema-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet-schema-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet_metadata.cc:if (!(*metadata)->schema().Equals(schema)) { ./tablet/tablet_metadata.cc:"match expected schema ($1)", (*metadata)->schema().ToString(), ./tablet/diff_scan-test.cc: SchemaBuilder builder(tablet->metadata()->schema()); ./tablet/tablet_replica-test.cc: ASSERT_OK(SchemaToPB(SchemaBuilder(tablet()->metadata()->schema()).Build(), &orig_schema_pb)); ./tablet/tablet_replica-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet_replica-test.cc: ASSERT_OK(SchemaToPB(SchemaBuilder(tablet()->metadata()->schema()).Build(), &orig_schema_pb)); ./tablet/tablet_replica-test.cc: SchemaBuilder builder(tablet()->metadata()->schema()); ./tablet/tablet.cc: : key_schema_(metadata->schema().CreateKeyProjection()), ./tablet/tablet.cc: metadata_->SetSchema(*op_state->schema(), op_state->schema_version()); ./tablet/tablet.cc: RollingDiskRowSetWriter drsw(metadata_.get(), merge->schema(), DefaultBloomSizing(), ./tablet/rowset_metadata.h:return tablet_metadata_->schema(); ./tablet/tablet.h:return &metadata_->schema(); ./tablet/all_types-scan-correctness-test.cc:SchemaBuilder builder(tablet()->metadata()->schema()); ./tools/kudu-tool-test.cc: .PartitionDebugString(meta->partition(), meta->schema()); ./tools/kudu-tool-test.cc: debug_str = meta->schema().ToString(); ./tools/kudu-tool-test.cc:.PartitionDebugString(meta->partition(), meta->schema()); ./tools/kudu-tool-test.cc:debug_str = meta->schema().ToString(); ./tools/tool_action_local_replica.cc:const auto& col_idx = meta->schema().find_column_by_id(col_id); ./tools/tool_action_local_replica.cc: meta->schema().column(col_idx).name() : "?"); ./tools/tool_action_local_replica.cc: Schema schema = meta->schema(); ./tools/tool_action_local_replica.cc: const Schema& schema = meta->schema(); ./tools/tool_action_local_replica.cc: meta->schema()) ./tserver/tablet_service.cc:Schema tablet_schema = replica->tablet_metadata()->schema(); ./tserver/tablet_service.cc: const auto& schema = replica->tablet_metadata()->schema(); ./tserver/tablet_service.cc: CHECK_OK(SchemaToPB(replica->tablet_metadata()->schema(), ./tserver/tablet_service.cc: const auto& schema = replica->tablet_metadata()->schema(); ./tserver/tablet_service.cc: Schema tablet_schema = replica->tablet_metadata()->schema(); ./tserver/tablet_service.cc: const auto& schema = replica->tablet_metadata()->schema(); ./tserver/tablet_service.cc: const Schema& tablet_schema = replica->tablet_metadata()->schema(); ./tserver/scanners.cc: spec().lower_bound_key()->Stringify(tablet_metadata->schema(); ./tserver/scanners.cc: spec().exclusive_upper_bound_key()->Stringify(tablet_metadata->schema(); ./tserver/tserver_path_handlers.cc: tmeta->schema()); ./tserver/tserver_path_handlers.cc: const Schema& schema = tmeta->schema(); ./master/sys_catalog.cc: if (!metadata->schema().Equals(BuildTableSchema())) { ./master/sys_catalog.cc:return(Status::Corruption("Unexpected schema", metadata->schema().ToString())); ./client/scan_token-internal.cc: RETURN_NOT_OK(SchemaFromPB(metadata.schema(), &schema)); ./client/client-test.cc:Schema schema = tablet_replica->tablet()->metadata()->schema(); ./client/client-test.cc:Schema schema = tablet_replica->tablet()->metadata()->schema(); ./client/client-test.cc:Schema schema = tablet_replica->tablet()->metadata()->schema(); ./client/client-test.cc:Schema schema = tablet_replica->tablet()->metadata()->schema(); ./client/client-test.cc:Schema
[jira] [Commented] (KUDU-3197) Tablet keeps all history schemas in memory may result in high memory consumption
[ https://issues.apache.org/jira/browse/KUDU-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203521#comment-17203521 ] Andrew Wong commented on KUDU-3197: --- Yeah, if I understand the concerns about old schemas here, the proposed approach seems pretty unsafe. If a scan lasts longer than the time it takes to update the schema 20 times, we might hit a segfault. After a brief look around for how the tablet metadatas' schemas are used on the tablet servers, for a ref-counting solution, it's probably worth identifying the top-level "owners" of the current schema pointers, i.e. the current callers of {{TabletMetadata::schema()}}, and ensuring that any references that those owners pass around either outlive the owners themselves or are also ref-counted. > Tablet keeps all history schemas in memory may result in high memory > consumption > > > Key: KUDU-3197 > URL: https://issues.apache.org/jira/browse/KUDU-3197 > Project: Kudu > Issue Type: Improvement > Components: tablet >Affects Versions: 1.12.0 >Reporter: wangningito >Assignee: wangningito >Priority: Minor > Attachments: image-2020-09-25-14-45-33-402.png, > image-2020-09-25-14-49-30-913.png, image-2020-09-25-15-05-44-948.png > > > In case of high frequency of updating table, memory consumption of > kudu-tserver may be very high, and the memory in not tracked in the memory > page. > This is the memory usage of a tablet, the memory consumption of tablet-xxx‘s > peak is 3.6G, but none of its' childrens' memory can reach. > !image-2020-09-25-14-45-33-402.png! > So I use pprof to get the heap sampling. The tserver started for long but the > memory is still consuming by TabletBootstrap:PlayAlterSchemaRequest. > !image-2020-09-25-14-49-30-913.png! > I change the `old_schemas_` in tablet_metadata.h to a fixed size vector, > // Previous values of 'schema_'. > // These are currently kept alive forever, under the assumption that > // a given tablet won't have thousands of "alter table" calls. > // They are kept alive so that callers of schema() don't need to > // worry about reference counting or locking. > std::vector old_schemas_; > The heap sampling then becomes > !image-2020-09-25-15-05-44-948.png! > So, to make application layer more flexible, it could be better to make the > size of the old_schemas configurable. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3134) Adjust default value for --raft_heartbeat_interval
[ https://issues.apache.org/jira/browse/KUDU-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203499#comment-17203499 ] Andrew Wong commented on KUDU-3134: --- It's worth noting that an increased heartbeat interval has implications on scans. Safe time is currently updated on followers via heartbeats from the leader, and one of the first things we do in snapshot scans as followers is wait for safe time to be advanced past the snapshot timestamp. As such, if we set a high heartbeat interval, scans to followers may end up timing out waiting for the safetime to be bumped. https://github.com/apache/kudu/blob/20fde59bca1f9df5a3cdee48f7794e0e8f16784a/src/kudu/tserver/tablet_service.cc#L3101 > Adjust default value for --raft_heartbeat_interval > -- > > Key: KUDU-3134 > URL: https://issues.apache.org/jira/browse/KUDU-3134 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.12.0 >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > > Users often increase the `--raft_heartbeat_interval` on larger clusters or on > clusters with high replica counts. This helps avoid the servers flooding each > other with heartbeat RPCs causing queue overflows and using too much idle > CPU. Users have adjusted the values from 1.5 seconds to as high as 10s and we > have never seen people complain about problems after doing so. > Anecdotally, I recently saw a cluster with 4k tablets per tablet server using > ~150% cpu usage while idle. By increasing the `--raft_heartbeat_interval` > from 500ms to 1500ms the cpu usage dropped to ~50%. > Generally speaking users often care about Kudu stability and scalability over > an extremely short MTTR. Additionally our default client RPC timeouts of 30s > also seem to indicate slightly longer failover/retry times are tolerable in > the default case. > We should consider adjusting the default value of `--raft_heartbeat_interval` > to a higher value to support larger and more efficient clusters by default. > Users who need a low MTTR can always adjust the value lower while also > adjusting other related timeouts. We may also want to consider adjusting the > default `--heartbeat_interval_ms` accordingly. > Note: Batching the RPCs like mentioned in KUDU-1973 or providing a server to > server proxy for heartbeating may be a way to solve the issues without > adjusting the default configuration. However, adjusting the configuration is > easy and has proven effective in production deployments. Additionally > adjusting the defaults along with a KUDU-1973 like approach could lead to > even lower idle resource usage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KUDU-3191) Fail tablet replicas that suffer from KUDU-2233 instead of crashing
[ https://issues.apache.org/jira/browse/KUDU-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong reassigned KUDU-3191: - Assignee: Andrew Wong > Fail tablet replicas that suffer from KUDU-2233 instead of crashing > --- > > Key: KUDU-3191 > URL: https://issues.apache.org/jira/browse/KUDU-3191 > Project: Kudu > Issue Type: Task > Components: compaction >Reporter: Andrew Wong >Assignee: Andrew Wong >Priority: Major > > KUDU-2233 results in persisted corruption that causes a broken invariant, > leading to a server crash. The recovery process for this corruption is > arduous, especially if there are multiple tablet replicas in a given server > that suffer from it -- users typically start the server, see the crash, > remove the affected replica manually via tooling, and restart, repeatedly > until the server comes up healthily. > Instead, we should consider treating this as we do CFile block-level > corruption[1] and fail the tablet replica. At best, we end up recovering from > a non-corrupted replica. At worst, we'd end up with multiple corrupted > replicas, which is still better than what we have today, which is multiple > corrupted replicas and unavailable servers that lead to excessive > re-replication. > [1] > https://github.com/apache/kudu/commit/cf6927cb153f384afb649b664de1d4276bd6d83f -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3193) Per-tablet histogram for scan predicate efficiency
[ https://issues.apache.org/jira/browse/KUDU-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3193: -- Description: Often times slow queries can be the result of a sub-optimal schema for a given workload, e.g. if a scan's predicate is not on a prefix of the primary key. Diagnosing such issues typically takes some understanding of the workloads that are being run against a given table. It'd be nice if there were something more quantitative to understand whether a table(t)'s schema is to blame for a slow scan. One thought that comes to mind is maintaining a histogram metric per-tablet of the ratio between the number of rows returned during a given scan and the number of rows iterated through during that scan. A consistently low value of this metric would indicate that predicates applied to the given tablet are doing a lot of IO reading rows that are not in the results set. was: Often times slow queries can be the result of a sub-optimal schema for a given workload, e.g. if a scan's predicate is not on a prefix of the primary key. Diagnosing such issues typically takes some understanding of the workloads that are being run against a given table. It'd be nice if there were something more quantitative to understand whether a table(t)'s schema is to blame for a slow scan. One thought that comes to mind is maintaining a histogram metric per-tablet of the ratio between the number of rows returned during a given scan and the number of rows iterated through during that scan. A consistently low value of this metric would indicate that predicates applied to the given tablet are not very effective. > Per-tablet histogram for scan predicate efficiency > -- > > Key: KUDU-3193 > URL: https://issues.apache.org/jira/browse/KUDU-3193 > Project: Kudu > Issue Type: Task > Components: metrics, ops-tooling, perf, tablet >Reporter: Andrew Wong >Priority: Major > > Often times slow queries can be the result of a sub-optimal schema for a > given workload, e.g. if a scan's predicate is not on a prefix of the primary > key. Diagnosing such issues typically takes some understanding of the > workloads that are being run against a given table. It'd be nice if there > were something more quantitative to understand whether a table(t)'s schema is > to blame for a slow scan. > One thought that comes to mind is maintaining a histogram metric per-tablet > of the ratio between the number of rows returned during a given scan and the > number of rows iterated through during that scan. A consistently low value of > this metric would indicate that predicates applied to the given tablet are > doing a lot of IO reading rows that are not in the results set. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3193) Per-tablet histogram for scan predicate efficiency
[ https://issues.apache.org/jira/browse/KUDU-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong updated KUDU-3193: -- Description: Often times slow queries can be the result of a sub-optimal schema for a given workload, e.g. if a scan's predicate is not on a prefix of the primary key. Diagnosing such issues typically takes some understanding of the workloads that are being run against a given table. It'd be nice if there were something more quantitative to understand whether a table(t)'s schema is to blame for a slow scan. One thought that comes to mind is maintaining a histogram metric per-tablet of the ratio between the number of rows returned during a given scan and the number of rows iterated through during that scan. A consistently low value of this metric would indicate that predicates applied to the given tablet are not very effective. was: Often times slow queries can be the result of a sub-optimal schema for a given workload, e.g. if a scan's predicate is not on a prefix of the primary key. Diagnosing such issues typically takes some understanding of the workloads that are being run against a given table. It'd be nice if there were something more quantitative to understand whether a table(t)'s schema is to blame for a slow scan. One thought that comes to mind is maintaining a histogram metric per-tablet of the ratio between the number of rows returned during a given and the number of rows iterated through during that scan. A consistently low value of this metric would indicate that predicates applied to the given tablet are not very effective. > Per-tablet histogram for scan predicate efficiency > -- > > Key: KUDU-3193 > URL: https://issues.apache.org/jira/browse/KUDU-3193 > Project: Kudu > Issue Type: Task > Components: metrics, ops-tooling, perf, tablet >Reporter: Andrew Wong >Priority: Major > > Often times slow queries can be the result of a sub-optimal schema for a > given workload, e.g. if a scan's predicate is not on a prefix of the primary > key. Diagnosing such issues typically takes some understanding of the > workloads that are being run against a given table. It'd be nice if there > were something more quantitative to understand whether a table(t)'s schema is > to blame for a slow scan. > One thought that comes to mind is maintaining a histogram metric per-tablet > of the ratio between the number of rows returned during a given scan and the > number of rows iterated through during that scan. A consistently low value of > this metric would indicate that predicates applied to the given tablet are > not very effective. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3193) Per-tablet histogram for scan predicate efficiency
Andrew Wong created KUDU-3193: - Summary: Per-tablet histogram for scan predicate efficiency Key: KUDU-3193 URL: https://issues.apache.org/jira/browse/KUDU-3193 Project: Kudu Issue Type: Task Components: metrics, ops-tooling, perf, tablet Reporter: Andrew Wong Often times slow queries can be the result of a sub-optimal schema for a given workload, e.g. if a scan's predicate is not on a prefix of the primary key. Diagnosing such issues typically takes some understanding of the workloads that are being run against a given table. It'd be nice if there were something more quantitative to understand whether a table(t)'s schema is to blame for a slow scan. One thought that comes to mind is maintaining a histogram metric per-tablet of the ratio between the number of rows returned during a given and the number of rows iterated through during that scan. A consistently low value of this metric would indicate that predicates applied to the given tablet are not very effective. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3192) Leverage cluster ID when playing HMS notifications
Andrew Wong created KUDU-3192: - Summary: Leverage cluster ID when playing HMS notifications Key: KUDU-3192 URL: https://issues.apache.org/jira/browse/KUDU-3192 Project: Kudu Issue Type: Task Components: hms Reporter: Andrew Wong KUDU-2574 added a unique cluster ID to the master system catalog table. We should leverage this with the HMS integration by 1) synchronizing the cluster ID to the HMS, storing it as a part of the table JSON, and 2) filtering the HMS notifications received by the HMS log listener based on cluster ID. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-3191) Fail tablet replicas that suffer from KUDU-2233 instead of crashing
Andrew Wong created KUDU-3191: - Summary: Fail tablet replicas that suffer from KUDU-2233 instead of crashing Key: KUDU-3191 URL: https://issues.apache.org/jira/browse/KUDU-3191 Project: Kudu Issue Type: Task Components: compaction Reporter: Andrew Wong KUDU-2233 results in persisted corruption that causes a broken invariant, leading to a server crash. The recovery process for this corruption is arduous, especially if there are multiple tablet replicas in a given server that suffer from it -- users typically start the server, see the crash, remove the affected replica manually via tooling, and restart, repeatedly until the server comes up healthily. Instead, we should consider treating this as we do CFile block-level corruption[1] and fail the tablet replica. At best, we end up recovering from a non-corrupted replica. At worst, we'd end up with multiple corrupted replicas, which is still better than what we have today, which is multiple corrupted replicas and unavailable servers that lead to excessive re-replication. [1] https://github.com/apache/kudu/commit/cf6927cb153f384afb649b664de1d4276bd6d83f -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wong resolved KUDU-3119. --- Fix Version/s: 1.12.0 Resolution: Fixed As far as I can tell (based on the logs and based on Cloudera's internal test-triaging history), the attached logs are all from Kudu 1.10, which doesn't have the fix for https://github.com/greg7mdp/sparsepp/issues/42. The version of sparsepp was bumped in 1.12 with [0fdfdc8|http://github.com/apache/kudu/commit/0fdfdc8]. > ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN > --- > > Key: KUDU-3119 > URL: https://issues.apache.org/jira/browse/KUDU-3119 > Project: Kudu > Issue Type: Bug > Components: CLI, test >Reporter: Alexey Serbin >Priority: Blocker > Fix For: 1.12.0 > > Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, > kudu-tool-test.log.xz > > > Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} > reports races for TSAN builds: > {noformat} > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266: > Failure > Failed > Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: > process exited with non-ze > ro status 66 > Google Test trace: > /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265: > W0506 17:5 > 6:02.744191 4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true > I0506 17:56:02.780252 4432 fs_manager.cc:263] Metadata directory not provided > I0506 17:56:02.780442 4432 fs_manager.cc:269] Using write-ahead log > directory (fs_wal_dir) as metad > ata directory > I0506 17:56:02.789638 4432 fs_manager.cc:399] Time spent opening directory > manager: real 0.007s > user 0.005s sys 0.002s > I0506 17:56:02.789986 4432 env_posix.cc:1676] Not raising this process' open > files per process limi > t of 1048576; it is already as high as it can go > I0506 17:56:02.790426 4432 file_cache.cc:465] Constructed file cache lbm > with capacity 419430 > == > WARNING: ThreadSanitizer: data race (pid=4432) > ... > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
[ https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176072#comment-17176072 ] Andrew Wong commented on KUDU-3119: --- The race isn't quite where I expected, per the following lines in the logs: {code:java} Write of size 1 at 0x7f82f790a760 by thread T5 (mutexes: write M1638): #0 spp::sparsegroup<>::_sizing(unsigned int) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1103:56 (libkudu_fs.so+0x102d70) #1 void spp::sparsegroup<>::_set_aux<>(kudu::MemTrackerAllocator, std::__1::allocator > >&, unsigned char, std::__1::pair<>&, spp::integral_constant) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31 (libkudu_fs.so+0x102ac8) #2 void spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, std::__1::allocator > >&, unsigned char, unsigned char, std::__1::pair<>&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13 (libkudu_fs.so+0x102a56) #3 std::__1::pair<>* spp::sparsegroup<>::set >(kudu::MemTrackerAllocator >, std::__1::allocator > > >&, unsigned char, std::__1::pair >&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9 (libkudu_fs.so+0x10295f) #4 std::__1::pair<>& spp::sparsetable<>::set >(unsigned long, std::__1::pair<>&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25 (libkudu_fs.so+0x1036ba) #5 std::__1::pair<>& spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22 (libkudu_fs.so+0x101910) #6 std::__1::pair<>& spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, kudu::BlockIdEqual, kudu::MemTrackerAllocator >, std::__1::allocator > > > >::DefaultValue>(kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28 (libkudu_fs.so+0x1014a1) #7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29 (libkudu_fs.so+0xeece0) #8 kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32 (libkudu_fs.so+0xe6a27) ... Previous read of size 1 at 0x7f82f790a760 by thread T6 (mutexes: write M1637): #0 spp::sparsegroup<>::_sizing(unsigned int) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1088:14 (libkudu_fs.so+0x102d1c) #1 void spp::sparsegroup<>::_set_aux > >(kudu::MemTrackerAllocator >, std::__1::allocator > > >&, unsigned char, std::__1::pair >&, spp::integral_constant) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31 (libkudu_fs.so+0x102ac8) #2 void spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, std::__1::allocator<> >&, unsigned char, unsigned char, std::__1::pair<>&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13 (libkudu_fs.so+0x102a56) #3 std::__1::pair<>* spp::sparsegroup<>::set > >(kudu::MemTrackerAllocator >, std::__1::allocator > > >&, unsigned char, std::__1::pair >&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9 (libkudu_fs.so+0x10295f) #4 std::__1::pair<>& spp::sparsetable<>::set > >(unsigned long, std::__1::pair >&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25 (libkudu_fs.so+0x1036ba) #5 std::__1::pair<>& spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22 (libkudu_fs.so+0x101910) #6 std::__1::pair<>& spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, kudu::BlockIdEqual, kudu::MemTrackerAllocator >, std::__1::allocator > > > >::DefaultValue>(kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28 (libkudu_fs.so+0x1014a1) #7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29 (libkudu_fs.so+0xeece0) #8 kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr) /data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32 (l
[jira] [Commented] (KUDU-3176) Backup & restore incompatibility
[ https://issues.apache.org/jira/browse/KUDU-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175771#comment-17175771 ] Andrew Wong commented on KUDU-3176: --- What was the error seen here? Do you have application logs for the restore job? Or at least can you point to what the issue is? > Backup & restore incompatibility > > > Key: KUDU-3176 > URL: https://issues.apache.org/jira/browse/KUDU-3176 > Project: Kudu > Issue Type: Bug >Reporter: Attila Bukor >Assignee: Attila Bukor >Priority: Critical > > The ownership in the backup metadata introduced in KUDU-3090 seems to have > backward/forward compatibility issues as restoring a backup with > post-ownership backup tool that was created on a pre-ownership cluster with > the matching backup tool fails. Other combinations might also fail but I > haven't reproduced them so far. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory
[ https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175080#comment-17175080 ] Andrew Wong commented on KUDU-3180: --- {quote}After we tuned -flush_threshold_secs to 1800(was 3600 before), we could avoid OOM {quote} If the server is running low on memory during these times, wouldn't that put it into memory-pressure mode anyway? If you're on 1.11, that should already schedule a flush for the mem-store that anchors the most memory. And in 1.12, based on the screenshots, we would still schedule some flushes for some fairly large mem-stores. Additionally, I would have expected write requests to also be throttled, further slowing down the memory growth. If there is no memory-pressure despite there being OOMs, I wonder if this could be related to KUDU-3030. {quote}Maybe could use max(memory_size, time_since_last_flush to define perf improvement of a mem-store flush, so that both big mem-stores and long_lived mem-stores could be flushed in priority. {quote} Yeah, my biggest concern is that we don't regress KUDU-3002, since the perf score for {{time_since_last_flush}} is limited. If we go down this route, that may need to be adjusted. > kudu don't always prefer to flush MRS/DMS that anchor more memory > - > > Key: KUDU-3180 > URL: https://issues.apache.org/jira/browse/KUDU-3180 > Project: Kudu > Issue Type: Improvement >Reporter: YifanZhang >Priority: Major > Attachments: image-2020-08-04-20-26-53-749.png, > image-2020-08-04-20-28-00-665.png > > > Current time-based flush policy always give a flush op a high score if we > haven't flushed for the tablet in a long time, that may lead to starvation of > ops that could free more memory. > We set -flush_threshold_mb=32, -flush_threshold_secs=1800 in a cluster, and > find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS > flushes and compactions, which seems not so reasonable. > !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory
[ https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172948#comment-17172948 ] Andrew Wong edited comment on KUDU-3180 at 8/7/20, 8:07 AM: Looking through the code a bit to explain the 0B logs retained, it seems like logs retained only accounts for the size of ReadableLogSegments, meaning if a WAL segment is still being written to, it will not be accounted for in the space retained estimate. See GetReplaySizeMap() in consensus/log.h for more details. {quote}It's not always true that older or larger mem-stores anchor more WAL bytes as far as I saw on /maintenance-manager page, so maybe we shouldn't always use WAL bytes anchored to determine what to flush.{quote} That's true, but WAL bytes anchored will be somewhat correlated with both the size and the age, not taking into account the above replay size map discrepancy. One question about your particular use case though: would tuning the {{--memory_pressure_percentage}} gflag help at all? If you reduce it significantly, you would guarantee MRS/DMS flushing would be prioritized over compactions. Admittedly, it will use the WAL bytes anchored to prioritize ops, but that should still work out to flush larger mem-stores in insert-mostly workloads. was (Author: andrew.wong): Looking through the code a bit to explain the 0B logs retained, it seems like logs retained only accounts for the size of ReadableLogSegments, meaning if a WAL segment is still being written to, it will be accounted for in the space retained estimate. See GetReplaySizeMap() in consensus/log.h for more details. {quote}It's not always true that older or larger mem-stores anchor more WAL bytes as far as I saw on /maintenance-manager page, so maybe we shouldn't always use WAL bytes anchored to determine what to flush.{quote} That's true, but WAL bytes anchored will be somewhat correlated with both the size and the age, not taking into account the above replay size map discrepancy. One question about your particular use case though: would tuning the {{--memory_pressure_percentage}} gflag help at all? If you reduce it significantly, you would guarantee MRS/DMS flushing would be prioritized over compactions. Admittedly, it will use the WAL bytes anchored to prioritize ops, but that should still work out to flush larger mem-stores in insert-mostly workloads. > kudu don't always prefer to flush MRS/DMS that anchor more memory > - > > Key: KUDU-3180 > URL: https://issues.apache.org/jira/browse/KUDU-3180 > Project: Kudu > Issue Type: Improvement >Reporter: YifanZhang >Priority: Major > Attachments: image-2020-08-04-20-26-53-749.png, > image-2020-08-04-20-28-00-665.png > > > Current time-based flush policy always give a flush op a high score if we > haven't flushed for the tablet in a long time, that may lead to starvation of > ops that could free more memory. > We set -flush_threshold_mb=32, -flush_threshold_secs=1800 in a cluster, and > find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS > flushes and compactions, which seems not so reasonable. > !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory
[ https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172948#comment-17172948 ] Andrew Wong commented on KUDU-3180: --- Looking through the code a bit, it seems like logs retained only accounts for the size of ReadableLogSegments, meaning if a WAL segment is still being written to, it will be accounted for in the space retained estimate. See GetReplaySizeMap() in consensus/log.h for more details. {quote}It's not always true that older or larger mem-stores anchor more WAL bytes as far as I saw on /maintenance-manager page, so maybe we shouldn't always use WAL bytes anchored to determine what to flush.{quote} That's true, but WAL bytes anchored will be somewhat correlated with both the size and the age, not taking into account the above replay size map discrepancy. One question about your particular use case though: would tuning the {{--memory_pressure_percentage}} gflag help at all? If you reduce it significantly, you would guarantee MRS/DMS flushing would be prioritized over compactions. Admittedly, it will use the WAL bytes anchored to prioritize ops, but that should still work out to flush larger mem-stores in insert-mostly workloads. > kudu don't always prefer to flush MRS/DMS that anchor more memory > - > > Key: KUDU-3180 > URL: https://issues.apache.org/jira/browse/KUDU-3180 > Project: Kudu > Issue Type: Improvement >Reporter: YifanZhang >Priority: Major > Attachments: image-2020-08-04-20-26-53-749.png, > image-2020-08-04-20-28-00-665.png > > > Current time-based flush policy always give a flush op a high score if we > haven't flushed for the tablet in a long time, that may lead to starvation of > ops that could free more memory. > We set -flush_threshold_mb=32, -flush_threshold_secs=1800 in a cluster, and > find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS > flushes and compactions, which seems not so reasonable. > !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory
[ https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172948#comment-17172948 ] Andrew Wong edited comment on KUDU-3180 at 8/7/20, 7:22 AM: Looking through the code a bit to explain the 0B logs retained, it seems like logs retained only accounts for the size of ReadableLogSegments, meaning if a WAL segment is still being written to, it will be accounted for in the space retained estimate. See GetReplaySizeMap() in consensus/log.h for more details. {quote}It's not always true that older or larger mem-stores anchor more WAL bytes as far as I saw on /maintenance-manager page, so maybe we shouldn't always use WAL bytes anchored to determine what to flush.{quote} That's true, but WAL bytes anchored will be somewhat correlated with both the size and the age, not taking into account the above replay size map discrepancy. One question about your particular use case though: would tuning the {{--memory_pressure_percentage}} gflag help at all? If you reduce it significantly, you would guarantee MRS/DMS flushing would be prioritized over compactions. Admittedly, it will use the WAL bytes anchored to prioritize ops, but that should still work out to flush larger mem-stores in insert-mostly workloads. was (Author: andrew.wong): Looking through the code a bit, it seems like logs retained only accounts for the size of ReadableLogSegments, meaning if a WAL segment is still being written to, it will be accounted for in the space retained estimate. See GetReplaySizeMap() in consensus/log.h for more details. {quote}It's not always true that older or larger mem-stores anchor more WAL bytes as far as I saw on /maintenance-manager page, so maybe we shouldn't always use WAL bytes anchored to determine what to flush.{quote} That's true, but WAL bytes anchored will be somewhat correlated with both the size and the age, not taking into account the above replay size map discrepancy. One question about your particular use case though: would tuning the {{--memory_pressure_percentage}} gflag help at all? If you reduce it significantly, you would guarantee MRS/DMS flushing would be prioritized over compactions. Admittedly, it will use the WAL bytes anchored to prioritize ops, but that should still work out to flush larger mem-stores in insert-mostly workloads. > kudu don't always prefer to flush MRS/DMS that anchor more memory > - > > Key: KUDU-3180 > URL: https://issues.apache.org/jira/browse/KUDU-3180 > Project: Kudu > Issue Type: Improvement >Reporter: YifanZhang >Priority: Major > Attachments: image-2020-08-04-20-26-53-749.png, > image-2020-08-04-20-28-00-665.png > > > Current time-based flush policy always give a flush op a high score if we > haven't flushed for the tablet in a long time, that may lead to starvation of > ops that could free more memory. > We set -flush_threshold_mb=32, -flush_threshold_secs=1800 in a cluster, and > find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS > flushes and compactions, which seems not so reasonable. > !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory
[ https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171714#comment-17171714 ] Andrew Wong edited comment on KUDU-3180 at 8/5/20, 7:36 PM: I've been discussing with [~aserbin] and [~granthenke] about this problem, and one thing that stands out about the issue here is that it isn't obvious what quantifiable values we should optimize for here. I think there are a few things to care about: * Insert/update performance * Memory used by mem-stores * Space anchored by WALs * To some extent, write amplification and size of output disk-stores These values don't explicitly trade off with one another, which makes it a bit difficult to determine the correct heuristic for when to flush mem-stores. Some different solutions we've been discussing are: * Defining some cost function based on the time since last flush AND memory used. This might be an improvement over today's policy, which uses a simple branching heuristic to pick based on time since last flush OR memory used. * Always using the WAL bytes anchored to determine what to flush. This has the benefit of somewhat taking into account both the time since last flush and memory used, in the sense that older mem-stores will tend to anchor more WAL bytes, and larger mem-stores will also tend to anchor more WAL bytes. This has the added benefit of keeping the "space anchored by WALs" value in mind, so we don't end up with something like KUDU-3002. * Update the policy based on the current amount of space used / memory used to pick the "right" values to trade off. E.g. if we are running low on WAL disk space, prioritize based on WAL bytes anchored; if we are running low on memory, prioritize based on memory used, etc. Before exploring the solution space further, it'd be better to more clearly define the problem at hand. [~zhangyifan27] what are the values that look off to you? What tradeoffs would you prefer to make in filing this jira? Would something as simple as lowering {{-flush_threshold_mb}} or increasing {{-flush_threshold_secs}} help you? was (Author: andrew.wong): I've been discussing with [~aserbin] and [~granthenke] about this problem, and one thing that stands out about the issue here is that it isn't obvious what quantifiable values we should optimize for here. I think there are a few things to care about: * Insert/update performance * Memory used by mem-stores * Space anchored by WALs * To some extent, write amplification and size of output disk-stores These values don't explicitly trade off with one another, which makes it a bit difficult to determine the correct heuristic for when to flush mem-stores. Some different solutions we've been discussing are: * Defining some cost function based on the time since last flush AND memory used. This might be an improvement over today's policy, which uses a simple branching heuristic to pick based on time since last flush OR memory used. * Always using the WAL bytes anchored to determine what to flush. This has the benefit of somewhat taking into account both the time since last flush and memory used, in the sense that older mem-stores will tend to anchor more WAL bytes, and larger mem-stores will also tend to anchor more WAL bytes. This has the added benefit of keeping the "space anchored by WALs" value in mind, so we don't end up with something like KUDU-3002. * Update the policy based on the current amount of space used / memory used to pick the "right" values to trade off. E.g. if we are running low on WAL disk space, prioritize based on WAL bytes anchored; if we are running low on memory, prioritize based on memory used, etc. Before exploring the solution space further, it'd be better to more clearly define the problem at hand. [~zhangyifan27] what are the values that look off to you? What tradeoffs would you prefer to make in filing this jira? Would something as simple as lowering {{-flush_threshold_mb}} or increasing {{-flush_threshold_secs}} help you? > kudu don't always prefer to flush MRS/DMS that anchor more memory > - > > Key: KUDU-3180 > URL: https://issues.apache.org/jira/browse/KUDU-3180 > Project: Kudu > Issue Type: Bug >Reporter: YifanZhang >Priority: Major > Attachments: image-2020-08-04-20-26-53-749.png, > image-2020-08-04-20-28-00-665.png > > > Current time-based flush policy always give a flush op a high score if we > haven't flushed for the tablet in a long time, that may lead to starvation of > ops that could free more memory. > We set -flush_threshold_mb=32, -flush_threshold_secs=1800 in a cluster, and > find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS > flushes and compactions, which seems not so reasona