[jira] [Assigned] (KUDU-3357) Allow servers to not use the advertised RPC addresses

2022-04-05 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-3357:
-

Assignee: Andrew Wong

> Allow servers to not use the advertised RPC addresses
> -
>
> Key: KUDU-3357
> URL: https://issues.apache.org/jira/browse/KUDU-3357
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Andrew Wong
>Assignee: Andrew Wong
>Priority: Major
>
> When Kudu servers are deployed within an internal network with internal 
> hostnames (e.g. in a k8s cluster), and Kudu clients are deployed outside of 
> this network with a mapping of external traffic to internal ports (e.g. with 
> a load balancer), it’s unclear how to route the Kudu client to the servers 
> without having all traffic (including RPCs between servers) use publicly 
> accessible addresses.
> For instance, all servers could be configured with the 
> --rpc_advertised_addreses configuration. However, since these addresses are 
> used to register servers with the Master, not only would they be used to 
> indicate where clients should look for data, but they would also be used to 
> indicate where replicas should heartbeat to other replicas. This would induce 
> a great deal of traffic on the load balancer.
> We should consider allowing “internal” (i.e. tserver and master) traffic to 
> bypass advertised addresses and use an alternate address. Or at the very 
> least, introduce a policy for selecting which advertised address to use 
> depending on what is available (currently, we always the first in the list).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (KUDU-3061) Balance tablet leaders across TServers

2022-03-16 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-3061:
-

Assignee: shenxingwuying

> Balance tablet leaders across TServers
> --
>
> Key: KUDU-3061
> URL: https://issues.apache.org/jira/browse/KUDU-3061
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, tablet
>Affects Versions: 1.11.1
>Reporter: Adam Voga
>Assignee: shenxingwuying
>Priority: Major
>  Labels: performance, roadmap-candidate, scalability
>
> The number of leader tablets per tablet server can become imbalanced over 
> time, putting additional pressure on a few nodes.
> A CLI tool or an extension to the existing balancer should be added to take 
> care of this.
> Currently the only option is running leader_step_down manually. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KUDU-3357) Allow servers to not use the advertised RPC addresses

2022-02-25 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3357:
-

 Summary: Allow servers to not use the advertised RPC addresses
 Key: KUDU-3357
 URL: https://issues.apache.org/jira/browse/KUDU-3357
 Project: Kudu
  Issue Type: Improvement
  Components: rpc
Reporter: Andrew Wong


When Kudu servers are deployed within an internal network with internal 
hostnames (e.g. in a k8s cluster), and Kudu clients are deployed outside of 
this network with a mapping of external traffic to internal ports (e.g. with a 
load balancer), it’s unclear how to route the Kudu client to the servers 
without having all traffic (including RPCs between servers) use publicly 
accessible addresses.

For instance, all servers could be configured with the 
--rpc_advertised_addreses configuration. However, since these addresses are 
used to register servers with the Master, not only would they be used to 
indicate where clients should look for data, but they would also be used to 
indicate where replicas should heartbeat to other replicas. This would induce a 
great deal of traffic on the load balancer.

We should consider allowing “internal” (i.e. tserver and master) traffic to 
bypass advertised addresses and use an alternate address. Or at the very least, 
introduce a policy for selecting which advertised address to use depending on 
what is available (currently, we always the first in the list).




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-3353) Support setnx semantic on column

2022-02-18 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494849#comment-17494849
 ] 

Andrew Wong commented on KUDU-3353:
---

Upon reading this first, I thought it sounded similar to INSERT_IGNORE, but 
letting it digest a bit, it seems a bit different since it deals with 
individual cells of an update row, rather than the entire row.

The tricky thing here, I think, is that we want to evaluate the value of an 
updated column before determining whether to apply the update. This is not 
something Kudu currently supports – we currently only check primary key 
presence before applying the row. And note that determining the old value may 
entail opening up several delta files. While note untenable (e.g., we still 
open delta files for the presence check to determine if a row was deleted), 
that is something that would need to be implemented as a part of this operation.

Another thought: would it make sense to introduce this as a new write op 
entirely, some SETNX (similar to INSERT_IGNORE), rather than a part of the 
schema?

> Support setnx semantic on column
> 
>
> Key: KUDU-3353
> URL: https://issues.apache.org/jira/browse/KUDU-3353
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, server
>Reporter: Yingchun Lai
>Priority: Major
>
> h1. motivation
> In some usage scenarios, Kudu table has a column with semantic of "create 
> time", which means it represent the create timestamp of the row. The other 
> columns have the similar semantic as before, for example, the user properties 
> like age, address, and etc.
> Upstream and Kudu user doesn't know whether a row is exist or not, and every 
> cell data is the lastest ingested from, for example, event stream.
> If without the "create time" column, Kudu user can use UPSERT operations to 
> write data to the table, every columns with data will overwrite the old data. 
> But if with the "create time" column, the cell data will be overwrote by the 
> following UPSERT ops, which is not what we expect.
> To achive the goal, we have to read the column out to judge whether the 
> column is NULL or not, if it's NULL, we can fill the row with the cell, if 
> not NULL, we will drop it from the data before UPSERT, to avoid overwite 
> "create time".
> It's expensive, is there a way to avoid a read from Kudu?
> h1. Resolvation
> We can implement column schema with semantic of "update if null". That means 
> cell data in changelist will update the base data if the latter is NULL, and 
> will ignore updates if it is not NULL.
> So we can use Kudu similarly as before, but only defined the column as 
> "update if null" when create table or add column.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports

2021-11-30 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451489#comment-17451489
 ] 

Andrew Wong commented on KUDU-3326:
---

That just about summarizes everything. Thanks for the recap!

> Add Soft Delete Table Supports
> --
>
> Key: KUDU-3326
> URL: https://issues.apache.org/jira/browse/KUDU-3326
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, CLI, client, master, test
>Reporter: dengke
>Assignee: dengke
>Priority: Major
>
> h2. Brief description:
>         Soft delete means that the kudu system will not delete the table 
> immediately after receiving the command to delete the table. Instead, it will 
> mark the table and set a validity period. After the validity period, will try 
> again to determine whether the table really needs to be deleted.
>          This feature can restore data conveniently and timely in the case of 
> accidental deletion.
> h2. Relevant modification points:
> 1. After deleting a table, the original table name will be renamed as 
> KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash 
> table.
>  2. The contents of the trash table are exactly the same as those of the 
> original table.   Although it cannot be renamed, added or deleted directly, 
> it can be read and written normally. The trash table will be retained for a 
> period of time by default (such as 7 days, which can be modified through 
> parameters). The compact priority of the trash table will be set to the 
> lowest to save the system resources.
>  3. The master needs to add a thread to process expired trash tables and 
> perform real deletion.
>  4. It is allowed to create a table with the same name as the original table, 
> and the newly created table with the same name can be deleted normally.
>  5. It is allowed to recall deleted tables, but the following two situations 
> cannot be recalled: the same original table name exists and the trash table 
> has expired.
> 6. The KUDU_TRASHED is a reserved string for the system. Users are not 
> allowed to create a table with table names starting with KUDU_TRASHED.
>  7. Kudu tool adaptation soft deletion.
>  8. Java API adaptation soft deletion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports

2021-11-29 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450798#comment-17450798
 ] 

Andrew Wong commented on KUDU-3326:
---

Alternatively, to avoid the whole question of naming convention, rather than 
relying on table renames (which incurs some IO on the tablet servers to persist 
metadata), we could introduce a separate list of trashed tables to the catalog 
manager that isn't visible to users via normal {{ListTable}} and {{OpenTable}} 
calls. When loading all the tables to memory, based on whether the table has a 
"trashed_time" field, Kudu could move the table into a separate container (i.e. 
not {{table_ids_map_}} or {{normalized_table_names_map_}}). When recalling, we 
could have users recall by specifying a table ID, and potentially giving a new 
name.

> Add Soft Delete Table Supports
> --
>
> Key: KUDU-3326
> URL: https://issues.apache.org/jira/browse/KUDU-3326
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, CLI, client, master, test
>Reporter: dengke
>Assignee: dengke
>Priority: Major
>
> h2. Brief description:
>         Soft delete means that the kudu system will not delete the table 
> immediately after receiving the command to delete the table. Instead, it will 
> mark the table and set a validity period. After the validity period, will try 
> again to determine whether the table really needs to be deleted.
>          This feature can restore data conveniently and timely in the case of 
> accidental deletion.
> h2. Relevant modification points:
> 1. After deleting a table, the original table name will be renamed as 
> KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash 
> table.
>  2. The contents of the trash table are exactly the same as those of the 
> original table.   Although it cannot be renamed, added or deleted directly, 
> it can be read and written normally. The trash table will be retained for a 
> period of time by default (such as 7 days, which can be modified through 
> parameters). The compact priority of the trash table will be set to the 
> lowest to save the system resources.
>  3. The master needs to add a thread to process expired trash tables and 
> perform real deletion.
>  4. It is allowed to create a table with the same name as the original table, 
> and the newly created table with the same name can be deleted normally.
>  5. It is allowed to recall deleted tables, but the following two situations 
> cannot be recalled: the same original table name exists and the trash table 
> has expired.
> 6. The KUDU_TRASHED is a reserved string for the system. Users are not 
> allowed to create a table with table names starting with KUDU_TRASHED.
>  7. Kudu tool adaptation soft deletion.
>  8. Java API adaptation soft deletion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports

2021-11-29 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450792#comment-17450792
 ] 

Andrew Wong commented on KUDU-3326:
---

Sorry for the late response here!
{quote}So in your opinion, only one trash table is allowed to exist to meet our 
design requirements?
{quote}
I wouldn't be against it, at least with the caveats mentioned.

That said, while we're thinking about design here, I do think wouldn't be too 
difficult to come up with a naming convention that does satisfy uniqueness 
constraints. For instance, we could add the creation timestamp to the trashed 
table's name, or better yet, the table ID. E.g. instead of KUDU_TRASH:A, we 
could name it {{KUDU_TRASH::A}} or {{{}KUDU_TRASH::A{}}}.

 {quote}
This function can be distinguished by adding commands parameters, or it is more 
convenient to mark the trash table directly during list?
 {quote}

I think adding an argument to the {{ListTables()}} API (or adding a new API 
with the argument) that opts into showing trashed tables seems reasonable. I 
think the default should be to not show them though; especially as they will 
not be visible to Impala.



> Add Soft Delete Table Supports
> --
>
> Key: KUDU-3326
> URL: https://issues.apache.org/jira/browse/KUDU-3326
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, CLI, client, master, test
>Reporter: dengke
>Assignee: dengke
>Priority: Major
>
> h2. Brief description:
>         Soft delete means that the kudu system will not delete the table 
> immediately after receiving the command to delete the table. Instead, it will 
> mark the table and set a validity period. After the validity period, will try 
> again to determine whether the table really needs to be deleted.
>          This feature can restore data conveniently and timely in the case of 
> accidental deletion.
> h2. Relevant modification points:
> 1. After deleting a table, the original table name will be renamed as 
> KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash 
> table.
>  2. The contents of the trash table are exactly the same as those of the 
> original table.   Although it cannot be renamed, added or deleted directly, 
> it can be read and written normally. The trash table will be retained for a 
> period of time by default (such as 7 days, which can be modified through 
> parameters). The compact priority of the trash table will be set to the 
> lowest to save the system resources.
>  3. The master needs to add a thread to process expired trash tables and 
> perform real deletion.
>  4. It is allowed to create a table with the same name as the original table, 
> and the newly created table with the same name can be deleted normally.
>  5. It is allowed to recall deleted tables, but the following two situations 
> cannot be recalled: the same original table name exists and the trash table 
> has expired.
> 6. The KUDU_TRASHED is a reserved string for the system. Users are not 
> allowed to create a table with table names starting with KUDU_TRASHED.
>  7. Kudu tool adaptation soft deletion.
>  8. Java API adaptation soft deletion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (KUDU-38) bootstrap should not replay logs that are known to be fully flushed

2021-11-29 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-38:
---

Assignee: Andrew Wong  (was: Todd Lipcon)

> bootstrap should not replay logs that are known to be fully flushed
> ---
>
> Key: KUDU-38
> URL: https://issues.apache.org/jira/browse/KUDU-38
> Project: Kudu
>  Issue Type: Sub-task
>  Components: tablet
>Affects Versions: M3
>Reporter: Todd Lipcon
>Assignee: Andrew Wong
>Priority: Major
>  Labels: data-scalability, roadmap-candidate, startup-time
>
> Currently the bootstrap process will process all of the log segments, 
> including those that can be trivially determined to contain only durable 
> edits. This makes startup unnecessarily slow.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports

2021-11-10 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442039#comment-17442039
 ] 

Andrew Wong commented on KUDU-3326:
---

{quote}Can we add a parameter to decide whether to eliminate (redundant) trash 
tables when deleting tables? If so, your elimination method will be very 
appropriate. Otherwise, the deletion will fails.{quote}

I guess this is reasonable, to err on the side of less aggressive deletion. So 
if we trash table A, then create a new table A, and then trash A before 
KUDU_TRASH:A is deleted, the user would be met with an error.

{quote}we can add a parameter to control the number of retained trash 
tables{quote}

We can, but how would the naming convention for the multiple trashed tables 
work? And if we have multiple trashed tables of the same name, should we be 
able to recall any trashed version of a table?

{quote}When HMS-synchronization is enabled,considering that HMS only manages 
metadata and has no impact on the timing data of kudu, it can be deleted 
directly from HMS. Rebuild it on the HMS during recall.{quote}

Sounds reasonable. I guess we can iron out the HMS features separately, after 
the initial patch is merged.

We'll also need to make sure that there's no confusion when listing tables. 
E.g. when listing, we shouldn't show any trashed tables unless the user 
explicitly asks to include trashed tables in the result.

> Add Soft Delete Table Supports
> --
>
> Key: KUDU-3326
> URL: https://issues.apache.org/jira/browse/KUDU-3326
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, CLI, client, master, test
>Reporter: dengke
>Assignee: dengke
>Priority: Major
>
> h2. Brief description:
>         Soft delete means that the kudu system will not delete the table 
> immediately after receiving the command to delete the table. Instead, it will 
> mark the table and set a validity period. After the validity period, will try 
> again to determine whether the table really needs to be deleted.
>          This feature can restore data conveniently and timely in the case of 
> accidental deletion.
> h2. Relevant modification points:
> 1. After deleting a table, the original table name will be renamed as 
> KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash 
> table.
>  2. The contents of the trash table are exactly the same as those of the 
> original table.   Although it cannot be renamed, added or deleted directly, 
> it can be read and written normally. The trash table will be retained for a 
> period of time by default (such as 7 days, which can be modified through 
> parameters). The compact priority of the trash table will be set to the 
> lowest to save the system resources.
>  3. The master needs to add a thread to process expired trash tables and 
> perform real deletion.
>  4. It is allowed to create a table with the same name as the original table, 
> and the newly created table with the same name can be deleted normally.
>  5. It is allowed to recall deleted tables, but the following two situations 
> cannot be recalled: the same original table name exists and the trash table 
> has expired.
> 6. The KUDU_TRASHED is a reserved string for the system. Users are not 
> allowed to create a table with table names starting with KUDU_TRASHED.
>  7. Kudu tool adaptation soft deletion.
>  8. Java API adaptation soft deletion.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KUDU-2681) Account for already-running tasks before sending new ones upon processing tablet reports

2021-11-08 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-2681.
---
Fix Version/s: 1.12.0
   Resolution: Duplicate

This seems like a duplicate of KUDU-2992

> Account for already-running tasks before sending new ones upon processing 
> tablet reports
> 
>
> Key: KUDU-2681
> URL: https://issues.apache.org/jira/browse/KUDU-2681
> Project: Kudu
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.7.1
>Reporter: Andrew Wong
>Priority: Major
>  Labels: scalability, supportability
> Fix For: 1.12.0
>
>
> I've seen a case where the master will reschedule the same delete tablet task 
> for a given tablet multiple times, e.g. because it received a new tablet 
> report that the tablet still exists on a given tserver. This results in 
> significant log-spam, and ends up sending excessive RPCs to the tablet 
> servers. Here are some master logs demonstrating this (note the repeated 
> attempt numbers):
>  
> {{I0129 05:09:43.918886 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:43.919509 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already 
> present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already 
> in progress: opening tablet}}
> {{I0129 05:09:43.919517 22190 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for 
> TS=90369522338b4763ae25dd0161d6e548 with a delay of 8226 ms (attempt = 10)}}
> {{I0129 05:09:43.960479 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:43.961150 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already 
> present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already 
> in progress: opening tablet}}
> {{I0129 05:09:43.961158 22190 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for 
> TS=90369522338b4763ae25dd0161d6e548 with a delay of 8235 ms (attempt = 10)}}
> {{I0129 05:09:44.016152 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:44.016383 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already 
> present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already 
> in progress: opening tablet}}
> {{I0129 05:09:44.016391 22190 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for 
> TS=90369522338b4763ae25dd0161d6e548 with a delay of 8206 ms (attempt = 10)}}
> {{I0129 05:09:44.226428 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:44.226753 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server:7050): delete failed for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb with error code TABLET_NOT_RUNNING: Already 
> present: State transition of tablet 1d75a2458b544c6ea01fb6ccb238ebbb already 
> in progress: opening tablet}}
> {{I0129 05:09:44.226773 22190 catalog_manager.cc:2700] Scheduling retry of 
> 1d75a2458b544c6ea01fb6ccb238ebbb Delete Tablet RPC for 
> TS=90369522338b4763ae25dd0161d6e548 with a delay of 8207 ms (attempt = 10)}}
> {{I0129 05:09:44.234709 22190 catalog_manager.cc:2922] Sending 
> DeleteTablet(TABLET_DATA_TOMBSTONED) for tablet 
> 1d75a2458b544c6ea01fb6ccb238ebbb on 90369522338b4763ae25dd0161d6e548 
> (server:7050) (Replica with old config index 3048677 (current committed 
> config index is 3054594))}}
> {{W0129 05:09:44.234923 22190 catalog_manager.cc:2892] TS 
> 90369522338b4763ae25dd0161d6e548 (server

[jira] [Assigned] (KUDU-3326) Add Soft Delete Table Supports

2021-10-13 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-3326:
-

Assignee: dengke

> Add Soft Delete Table Supports
> --
>
> Key: KUDU-3326
> URL: https://issues.apache.org/jira/browse/KUDU-3326
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, CLI, client, master, test
>Reporter: dengke
>Assignee: dengke
>Priority: Major
>
> h2. Brief description:
>         Soft delete means that the kudu system will not delete the table 
> immediately after receiving the command to delete the table. Instead, it will 
> mark the table and set a validity period. After the validity period, will try 
> again to determine whether the table really needs to be deleted.
>          This feature can restore data conveniently and timely in the case of 
> accidental deletion.
> h2. Relevant modification points:
> 1. After deleting a table, the original table name will be renamed as 
> KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash 
> table.
>  2. The contents of the trash table are exactly the same as those of the 
> original table.   Although it cannot be renamed, added or deleted directly, 
> it can be read and written normally. The trash table will be retained for a 
> period of time by default (such as 7 days, which can be modified through 
> parameters). The compact priority of the trash table will be set to the 
> lowest to save the system resources.
>  3. The master needs to add a thread to process expired trash tables and 
> perform real deletion.
>  4. It is allowed to create a table with the same name as the original table, 
> and the newly created table with the same name can be deleted normally.
>  5. It is allowed to recall deleted tables, but the following two situations 
> cannot be recalled: the same original table name exists and the trash table 
> has expired.
> 6. The KUDU_TRASHED is a reserved string for the system. Users are not 
> allowed to create a table with table names starting with KUDU_TRASHED.
>  7. Kudu tool adaptation soft deletion.
>  8. Java API adaptation soft deletion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3326) Add Soft Delete Table Supports

2021-10-13 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428466#comment-17428466
 ] 

Andrew Wong commented on KUDU-3326:
---

Thanks for the contribution so far! I do have some questions regarding design, 
as well as some thoughts on how I am thinking about them.

*How does this work when a table is trashed and recreated? And then trashed 
again before the reservation is complete?*
In this case, it may be worth synchronously deleting the table (and its trashed 
data) and creating the table with the same non-trash name. That way, we only 
ever have one trashed table at a time, and the first scenario in #5 is 
addressed. This runs the risk of losing data if, e.g. the create fails for 
whatever reason. But I think that is a reasonable behavior, since the user has 
expressed the desire to forget about the old table by creating a new table for 
the same name.

*How does this work when HMS-synchronization is enabled? The HMS doesn't allow 
for the ":" character. Are trashed tables deleted from the HMS immediately? Or 
only when fully deleted?*
When deleting a table, we should still propagate the deletion to the HMS 
(rather than a table rename) immediately. Upon recalling the table, we should 
recreate the table in the HMS. When performing the {{hms check}} and {{hms 
fix}} tools, we should probably ignore trashed tables.

For that matter, for other tools and APIs (list, open table, etc), we may want 
to ignore trashed tables as well unless explicitly requested, and ensure the 
only thing a user can do with a trashed table is recall it.

> Add Soft Delete Table Supports
> --
>
> Key: KUDU-3326
> URL: https://issues.apache.org/jira/browse/KUDU-3326
> Project: Kudu
>  Issue Type: New Feature
>  Components: api, CLI, client, master, test
>Reporter: dengke
>Priority: Major
>
> h2. Brief description:
>         Soft delete means that the kudu system will not delete the table 
> immediately after receiving the command to delete the table. Instead, it will 
> mark the table and set a validity period. After the validity period, will try 
> again to determine whether the table really needs to be deleted.
>          This feature can restore data conveniently and timely in the case of 
> accidental deletion.
> h2. Relevant modification points:
> 1. After deleting a table, the original table name will be renamed as 
> KUDU_TRASHED: < timestamp >: < original table name >, which becomes a trash 
> table.
>  2. The contents of the trash table are exactly the same as those of the 
> original table.   Although it cannot be renamed, added or deleted directly, 
> it can be read and written normally. The trash table will be retained for a 
> period of time by default (such as 7 days, which can be modified through 
> parameters). The compact priority of the trash table will be set to the 
> lowest to save the system resources.
>  3. The master needs to add a thread to process expired trash tables and 
> perform real deletion.
>  4. It is allowed to create a table with the same name as the original table, 
> and the newly created table with the same name can be deleted normally.
>  5. It is allowed to recall deleted tables, but the following two situations 
> cannot be recalled: the same original table name exists and the trash table 
> has expired.
> 6. The KUDU_TRASHED is a reserved string for the system. Users are not 
> allowed to create a table with table names starting with KUDU_TRASHED.
>  7. Kudu tool adaptation soft deletion.
>  8. Java API adaptation soft deletion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-1620) Consensus peer proxy hostnames should be reresolved on failure

2021-10-12 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-1620.
---
Fix Version/s: 1.16.0
 Assignee: Andrew Wong
   Resolution: Fixed

> Consensus peer proxy hostnames should be reresolved on failure
> --
>
> Key: KUDU-1620
> URL: https://issues.apache.org/jira/browse/KUDU-1620
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Affects Versions: 1.0.0
>Reporter: Adar Dembo
>Assignee: Andrew Wong
>Priority: Major
>  Labels: docker
> Fix For: 1.16.0
>
>
> Noticed this while documenting the workflow to replace a dead master, which 
> currently bypasses Raft config changes in favor of having the replacement 
> master "masquerade" as the dead master via DNS changes.
> Internally we never rebuild consensus peer proxies in the event of network 
> failure; we assume that the peer will return at the same location. Nominally 
> this is reasonable; allowing peers to change host/port information on the fly 
> is tricky and has yet to be implemented. But, we should at least retry the 
> DNS resolution; not doing so forces the workflow to include steps to restart 
> the existing masters, which creates a (small) availability outage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-75) Allow RPC proxies to take HostPort and do DNS resolution inline with calls

2021-10-12 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-75?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-75.
-
Fix Version/s: 1.16.0
 Assignee: Andrew Wong
   Resolution: Fixed

> Allow RPC proxies to take HostPort and do DNS resolution inline with calls
> --
>
> Key: KUDU-75
> URL: https://issues.apache.org/jira/browse/KUDU-75
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Affects Versions: M4
>Reporter: Todd Lipcon
>Assignee: Andrew Wong
>Priority: Major
> Fix For: 1.16.0
>
>
> A lot of RPC calls will be done against host/ports rather than ip/ports. We 
> should make the Proxy itself do the resolution inline in the async path (and 
> perhaps have some method to refresh DNS)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-1885) Master caches DNS name resolution forever

2021-10-12 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-1885.
---
Fix Version/s: 1.16.0
 Assignee: Andrew Wong
   Resolution: Fixed

> Master caches DNS name resolution forever
> -
>
> Key: KUDU-1885
> URL: https://issues.apache.org/jira/browse/KUDU-1885
> Project: Kudu
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.3.0
>Reporter: Adar Dembo
>Assignee: Andrew Wong
>Priority: Major
> Fix For: 1.16.0
>
>
> TSDescriptor::GetTSAdminProxy() and TSDescriptor::GetConsensusProxy() will 
> return the same proxy instances over and over. Normally, this is a reasonable 
> optimization. But suppose the IP address of the tserver changes (due to a 
> DHCP lease expiring or some such). Now these methods will be returning 
> unusable proxies, and there's no way to "reset" them.
> Admittedly this scenario is a little contrived: if a tserver's IP address 
> suddenly changes, a bunch of other stuff will break too. The tserver will 
> probably need to be restarted (since it's bound to a socket whose address no 
> longer exists), and consensus may be thoroughly wrecked due to built-in 
> host/port assumptions (see KUDU-418).
> An issue like this was reported by a user in Slack, who was running a master 
> and tserver on the same box. The symptom was "half-open" communication 
> between them: the tserver could heartbeat to the master, but the master could 
> not send RPCs to the tserver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3300) Include the full path of the container in the error message

2021-10-12 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3300:
--
Labels: newbie  (was: )

> Include the full path of the container in the error message
> ---
>
> Key: KUDU-3300
> URL: https://issues.apache.org/jira/browse/KUDU-3300
> Project: Kudu
>  Issue Type: Improvement
>  Components: cfile
>Reporter: Abhishek
>Priority: Minor
>  Labels: newbie
>
> If there are multiple data directories configured, having the linux path to 
> the full container file will help to locate the file without having to search 
> for the file 
> Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: Could 
> not open container 26f5cbd97dfe4cb98f49bb0a6a494e8f: Invalid magic number: 
> Expected: kuducntr, found: \000\000\020\001\030▒▒▒



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3311) Allow masters to start up with a list of masters with a diff of one from what's on disk

2021-10-12 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3311:
--
Labels: newbie  (was: )

> Allow masters to start up with a list of masters with a diff of one from 
> what's on disk
> ---
>
> Key: KUDU-3311
> URL: https://issues.apache.org/jira/browse/KUDU-3311
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: Andrew Wong
>Priority: Major
>  Labels: newbie
>
> Now that Kudu automatically adds a master if we start up a new master 
> alongside an existing only set of masters, we should also loosen the 
> restriction that the gflag is the same as the existing Raft config, in case 
> users want to add a master and then restart the entire cluster at the same 
> time.
> This seems like it would be common enough for orchestration tools like CM, 
> which marks the cluster as stale and suggests a full service restart upon 
> adding a new master role.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3325) When wal is deleted, fault recovery and load balancing are abnormal

2021-10-11 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427361#comment-17427361
 ] 

Andrew Wong commented on KUDU-3325:
---

I'm curious -- why was the WAL deleted in the first place? In general, Kudu 
never expects that files are deleted from out underneath it. Was this caused by 
some power failure? Some disk loss? I think the best route forward would be to 
treat the tablet as failed, and re-replicate from another replica if available.

> When wal is deleted, fault recovery and load balancing are abnormal
> ---
>
> Key: KUDU-3325
> URL: https://issues.apache.org/jira/browse/KUDU-3325
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus
>Reporter: yejiabao_h
>Priority: Major
> Attachments: image-2021-10-06-15-36-40-996.png, 
> image-2021-10-06-15-36-53-813.png, image-2021-10-06-15-37-09-520.png, 
> image-2021-10-06-15-37-24-776.png, image-2021-10-06-15-37-42-533.png, 
> image-2021-10-06-15-37-54-782.png, image-2021-10-06-15-38-06-575.png, 
> image-2021-10-06-15-38-17-388.png, image-2021-10-06-15-38-29-176.png, 
> image-2021-10-06-15-38-39-852.png, image-2021-10-06-15-38-53-343.png, 
> image-2021-10-06-15-39-03-296.png, image-2021-10-06-19-23-51-769.png
>
>
> h3. 1、using kudu leader step down to create multiple wal message
> ./kudu tablet leader_step_down  $MASTER_IP   1299f5a939d2453c83104a6db0cae3e7 
> h4. wal
> !image-2021-10-06-15-36-40-996.png!
> h4. cmeta
> !image-2021-10-06-15-36-53-813.png!
> h3. 2、stop one of tserver to start tablet recovery,so that we can make 
> opid_index flush to cmeta
> !image-2021-10-06-15-37-09-520.png!
> h4. wal
> !image-2021-10-06-15-37-24-776.png!
> h4. cmeta
> !image-2021-10-06-15-37-42-533.png!
> h3. 3、stop all tservers,and delete tablet wal
> !image-2021-10-06-15-37-54-782.png!
> h3. 4、start all tservers
> we can see the index in wal starts counting from 1, but the opid_index 
> recorded in cmeta is the value 20 which is before deleting wal
>  
> h4. wal
> !image-2021-10-06-15-38-06-575.png!
>  
> h4. cmeta
> !image-2021-10-06-15-38-17-388.png!
>  
> h3. 5、stop a tserver,trigger fault recovery
> !image-2021-10-06-15-38-29-176.png!
> when the leader recovery a replica, and master request change raft config to 
> add the new replica to new raft config, leader replica while ignored because 
> the opindex is smaller than that in cmeta.
>  
> h3. 6、delete all wals
> !image-2021-10-06-15-38-39-852.png!
> h3. 7、kudu cluster rebalance
> ./kudu cluster rebalance $MASTER_IP
> !image-2021-10-06-15-38-53-343.png!
> !image-2021-10-06-15-39-03-296.png!
> rebalance is also failed when change raft config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3311) Allow masters to start up with a list of masters with a diff of one from what's on disk

2021-08-26 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3311:
-

 Summary: Allow masters to start up with a list of masters with a 
diff of one from what's on disk
 Key: KUDU-3311
 URL: https://issues.apache.org/jira/browse/KUDU-3311
 Project: Kudu
  Issue Type: Improvement
  Components: master
Reporter: Andrew Wong


Now that Kudu automatically adds a master if we start up a new master alongside 
an existing only set of masters, we should also loosen the restriction that the 
gflag is the same as the existing Raft config, in case users want to add a 
master and then restart the entire cluster at the same time.

This seems like it would be common enough for orchestration tools like CM, 
which marks the cluster as stale and suggests a full service restart upon 
adding a new master role.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3310) Checksum scan results for lagging replicas can be confusing

2021-08-19 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3310:
-

 Summary: Checksum scan results for lagging replicas can be 
confusing
 Key: KUDU-3310
 URL: https://issues.apache.org/jira/browse/KUDU-3310
 Project: Kudu
  Issue Type: Improvement
  Components: ops-tooling
Reporter: Andrew Wong


When running a checksum scan, we've seen cases where the following is reported:
{code}
Error: Remote error: Service unavailable: Timed out: could not wait for desired 
snapshot timestamp to be consistent: Timed out waiting for ts: P: 1621906 
798986764 usec, L: 0 to be safe (mode: NON-LEADER). Current safe time: P: 
1621906798962044 usec, L: 0 Physical time difference: 0.025s
{code}
and this results in messages like:
{code}
Aborted: checksum scan error: 1 errors were detected
{code}

Without much context about Kudu, this makes it seem like there is some 
corruption between replicas, even though the issue is just that the replica is 
lagging a bit. We should consider either:
- allowing the wait time to be configured when running the tool, or
- reword the result such that it's clear the scan failed and no checksums were 
verified for the tablet



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3290) Implement Replicate table's data to Kafka(or other Storage System)

2021-08-04 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393526#comment-17393526
 ] 

Andrew Wong commented on KUDU-3290:
---

Sorry for the late reply. I do think between the two, the learner replica seems 
more palatable, given it leaves the flexibility of decoupling the IO from the 
rest of the cluster (e.g. if we put all learners in a single tablet server). 
That said, it also increases the amount of IO done, given we have to replicate 
to an extra node. Maybe that's fine though.

{quote}
we can trigger a full scan at the timestamp and replicate data to learner, and 
then recover the appendEntries flow
{quote}

I'm not sure I understand this part, but I think you're referring to the 
conceptual equivalent of performing a tablet copy. When there aren't enough 
WALs in the leader to catch up a replica, the leader sends a tablet copy 
request to the follower, and the follower is "caught up" via a tablet copy, and 
then the tablet scanned and sent to Kafka. Is that right?

In this case, for the remote learner, I wonder if in addition to a regular 
tablet copy to the local learner, there is room here to also rely on the 
incremental scan developed for backups. If the learner knows what index it has 
replicated to Kafka, it should also be able to keep track of the timestamp 
associated with that OpId. If so, Kudu should be able to perform a differential 
scan between that timestamp and the latest timestamp in the newly copied 
replica. Of course, if the retention window is too short, this wouldn't work.

Also, in your proposal you mentioned replicas and leaders keeping track of more 
state for the sake of catching up the external service. If so, it'd be great if 
you could clarify exactly what state we would need (most recently replicated 
OpId? maybe its timestamp? anything else?) and where that state would be stored 
(with consensus metadata? somewhere else?).

> Implement Replicate table's data to Kafka(or other Storage System)
> --
>
> Key: KUDU-3290
> URL: https://issues.apache.org/jira/browse/KUDU-3290
> Project: Kudu
>  Issue Type: New Feature
>  Components: tserver
>Reporter: shenxingwuying
>Priority: Critical
>
> h1. background & problem
> We use kudu to store the user profile data, because business requirements, 
> exchange and share data from multi-tenant users, which is reasonable in our 
> application scene, we need replicate data from one system to another. The 
> destination storage system we pick kafka, because of our company's 
> architecture at now.
> At this time, we have two ideas to solve it.
> h1. two replication scheme
> Generally, Raft group has three replicas, one is leader and the other two are 
> followers. We’ll add a replica, its role is Learner. Learner only receive all 
> the data, but not pariticipart in ther leadership election.
> The learner replica, its state machine will be a plugin system, eg:
>  # We can support KuduEngine, which just a data backup like mongodb’s hidden 
> replica.
>  # We can write to the thirdparty store system, like kafka or any other 
> system we need. Then we can replicate data to another system use its client.
> At Paxos has a learner role, which only receive data. we need such a role for 
> new membership.
> But it Kudu Learner has been used for the copying(recovering) tablet replica. 
> Maybe we need a new role name, at this, we still use Learner to represent the 
> new role. (We should think over new role name)
> In our application scene, we will replicate data to kafka, and I will explain 
> the method.
> h2. Learner replication
>  # Add a new replica role, maybe we call it learner, because Paxos has a 
> learner role, which only receive data. We need such a role for new 
> membership. But at Kudu Learner has been used for the copying(recovering) 
> tablet replica. Maybe we need a new role name, at this, we still use Learner 
> to represent the new role. (We should think over new role name)
>  # The voters's safepoint of clean obsoleted wal is min(leader’ max wal 
> sequence number, followers max wal sequence number, learner’ max wal sequence 
> number)
>  # The learner not voter, not partitipant in elections
>  # Raft can replication data to the learner
>  # The process of learner applydb, just like raft followers, the logs before 
> committed index will replicate to kafka, kafka’s response ok. the apply index 
> will increase.
>  # We need kafka client, it will be added to kudu as an option, maybe as an 
> compile option
>  # When a kudu-tserver decomission or corrupted, the learner must move to new 
> kudu-tserver. So the leader should save learner apply OpId, and replicate to 
> followers, when learner's failover when leader down.
>  # The leader must save the learners apply O

[jira] [Resolved] (KUDU-3307) Update Kudu docker entry script to take data directories parameter

2021-07-30 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3307.
---
Fix Version/s: 1.16.0
   Resolution: Fixed

> Update Kudu docker entry script to take data directories parameter
> --
>
> Key: KUDU-3307
> URL: https://issues.apache.org/jira/browse/KUDU-3307
> Project: Kudu
>  Issue Type: Improvement
>  Components: docker
>Reporter: Bankim Bhavsar
>Assignee: Andrew Wong
>Priority: Major
>  Labels: newbie
> Fix For: 1.16.0
>
>
> Current docker entry point script takes environment variable {{DATA_DIR}}. 
> However it's expected to be a single directory and that's supplied as 
> {{-fs_wal_dir}} and not as one would expect {{-fs_data_dirs}}.
> [https://github.com/apache/kudu/blob/master/docker/kudu-entrypoint.sh#L41]
> [https://github.com/apache/kudu/blob/master/docker/kudu-entrypoint.sh#L57-L59]
> We need to make updates to the entry script to be able to supply separate 
> configuration for data directories. Need to ensure these directories are 
> either created in the script or possibly within kudu server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-3307) Update Kudu docker entry script to take data directories parameter

2021-07-30 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-3307:
-

Assignee: Andrew Wong  (was: Abhishek)

> Update Kudu docker entry script to take data directories parameter
> --
>
> Key: KUDU-3307
> URL: https://issues.apache.org/jira/browse/KUDU-3307
> Project: Kudu
>  Issue Type: Improvement
>  Components: docker
>Reporter: Bankim Bhavsar
>Assignee: Andrew Wong
>Priority: Major
>  Labels: newbie
>
> Current docker entry point script takes environment variable {{DATA_DIR}}. 
> However it's expected to be a single directory and that's supplied as 
> {{-fs_wal_dir}} and not as one would expect {{-fs_data_dirs}}.
> [https://github.com/apache/kudu/blob/master/docker/kudu-entrypoint.sh#L41]
> [https://github.com/apache/kudu/blob/master/docker/kudu-entrypoint.sh#L57-L59]
> We need to make updates to the entry script to be able to supply separate 
> configuration for data directories. Need to ensure these directories are 
> either created in the script or possibly within kudu server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3296) Synchronize the master addresses with the HMS when the Master performs a Raft config change

2021-06-21 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3296:
-

 Summary: Synchronize the master addresses with the HMS when the 
Master performs a Raft config change
 Key: KUDU-3296
 URL: https://issues.apache.org/jira/browse/KUDU-3296
 Project: Kudu
  Issue Type: Bug
  Components: master
Reporter: Andrew Wong


Today, the leader master can service config changes to add or remove replicas 
in its Raft config. It would be great if, after doing this successfully, the 
leader master also sent requests to the HMS to update all tables' metadata to 
reflect this change, given each HMS table entry includes the master addresses 
for discoverability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-3294) ksck and rebalancer tools are useless if they can't resolve any of the tserver addresses

2021-06-15 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-3294:
-

Assignee: Andrew Wong

> ksck and rebalancer tools are useless if they can't resolve any of the 
> tserver addresses
> 
>
> Key: KUDU-3294
> URL: https://issues.apache.org/jira/browse/KUDU-3294
> Project: Kudu
>  Issue Type: Bug
>  Components: ops-tooling
>Reporter: Andrew Wong
>Assignee: Andrew Wong
>Priority: Major
>
> One of the first steps we perform when running {{ksck}} or the rebalancer 
> tool is resolve the addresses of all tablet servers to initialize the proxies 
> that will be used by each tool. If this step fails, the tool returns early, 
> resulting in pitifully barren output along the lines of "an empty cluster", 
> or simply a report that contains no tables, and not much to debug with other 
> than a complaint like the following:
> {code:java}
> Network error: error fetching the cluster metadata from the leader master: 
> unable to resolve address for : Name or service not known {code}
> At worst, we should just skip over the tablet server and treat it as failed 
> in the report.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3294) ksck and rebalancer tools are useless if they can't resolve any of the tserver addresses

2021-06-15 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3294:
-

 Summary: ksck and rebalancer tools are useless if they can't 
resolve any of the tserver addresses
 Key: KUDU-3294
 URL: https://issues.apache.org/jira/browse/KUDU-3294
 Project: Kudu
  Issue Type: Bug
  Components: ops-tooling
Reporter: Andrew Wong


One of the first steps we perform when running {{ksck}} or the rebalancer tool 
is resolve the addresses of all tablet servers to initialize the proxies that 
will be used by each tool. If this step fails, the tool returns early, 
resulting in pitifully barren output along the lines of "an empty cluster", or 
simply a report that contains no tables, and not much to debug with other than 
a complaint like the following:
{code:java}
Network error: error fetching the cluster metadata from the leader master: 
unable to resolve address for : Name or service not known {code}
At worst, we should just skip over the tablet server and treat it as failed in 
the report.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2302) Leader crashes if it can't resolve DNS address of a peer

2021-06-12 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-2302.
---
Fix Version/s: 1.16
   Resolution: Fixed

> Leader crashes if it can't resolve DNS address of a peer
> 
>
> Key: KUDU-2302
> URL: https://issues.apache.org/jira/browse/KUDU-2302
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus, master, tserver
>Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, 
> 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0
>Reporter: Todd Lipcon
>Assignee: Andrew Wong
>Priority: Critical
>  Labels: crash, roadmap-candidate, stability
> Fix For: 1.16
>
>
> In BecomeLeader we call:
> {code}
>  CHECK_OK(BecomeLeaderUnlocked());
> {code}
> This will fail if it fails to resolve the address of one of its peers. 
> Instead it should probably continue to be leader but consider attempts to RPC 
> to that peer to be failed due to network resolution (with periodic retries of 
> resolution)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-2302) Leader crashes if it can't resolve DNS address of a peer

2021-06-10 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-2302:
-

Assignee: Andrew Wong

> Leader crashes if it can't resolve DNS address of a peer
> 
>
> Key: KUDU-2302
> URL: https://issues.apache.org/jira/browse/KUDU-2302
> Project: Kudu
>  Issue Type: Bug
>  Components: consensus, master, tserver
>Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, 
> 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0
>Reporter: Todd Lipcon
>Assignee: Andrew Wong
>Priority: Critical
>  Labels: crash, roadmap-candidate, stability
>
> In BecomeLeader we call:
> {code}
>  CHECK_OK(BecomeLeaderUnlocked());
> {code}
> This will fail if it fails to resolve the address of one of its peers. 
> Instead it should probably continue to be leader but consider attempts to RPC 
> to that peer to be failed due to network resolution (with periodic retries of 
> resolution)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3291) Crash when performing a diff scan after delta flush races with a batch of ops that update the same row

2021-06-08 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3291.
---
Fix Version/s: 1.15.0
   Resolution: Fixed

> Crash when performing a diff scan after delta flush races with a batch of ops 
> that update the same row
> --
>
> Key: KUDU-3291
> URL: https://issues.apache.org/jira/browse/KUDU-3291
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0
>Reporter: Andrew Wong
>Assignee: Andrew Wong
>Priority: Critical
> Fix For: 1.15.0
>
>
> It's possible to run into the following crash:
> {code:java}
> F0604 23:20:50.032124 35483072 delta_store.h:153] Check failed: 
> a.delta_store_id == b.delta_store_id (4445773336 vs. 4445771896)
> *** Check failure stack trace: ***
> *** Aborted at 1622874050 (unix time) try "date -d @1622874050" if you are 
> using GNU date ***
> PC: @ 0x7fff724b033a __pthread_kill
> *** SIGABRT (@0x7fff724b033a) received by PID 69138 (TID 0x1021d6dc0) stack 
> trace: ***
> @ 0x7fff725615fd _sigtramp
> @ 0x7ffeef948568 (unknown)
> @ 0x7fff72437808 abort
> @0x107920599 google::logging_fail()
> @0x10791f4cf google::LogMessage::SendToLog()
> @0x10791fb95 google::LogMessage::Flush()
> @0x107923c9f google::LogMessageFatal::~LogMessageFatal()
> @0x107920b29 google::LogMessageFatal::~LogMessageFatal()
> @0x1009ae07e 
> kudu::tablet::SelectedDeltas::DeltaLessThanFunctor::operator()()
> @0x1009aa561 std::__1::max<>()
> @0x10099c740 kudu::tablet::SelectedDeltas::ProcessDelta()
> @0x10099e719 kudu::tablet::SelectedDeltas::MergeFrom()
> @0x1009a2b30 kudu::tablet::DeltaPreparer<>::SelectDeltas()
> @0x10094a545 kudu::tablet::DeltaFileIterator<>::SelectDeltas()
> @0x10098b10c kudu::tablet::DeltaIteratorMerger::SelectDeltas()
> @0x10097133f 
> kudu::tablet::DeltaApplier::InitializeSelectionVector()
> @0x1056df4fb kudu::MaterializingIterator::MaterializeBlock()
> @0x1056df2d8 kudu::MaterializingIterator::NextBlock()
> @0x1056d1c5b kudu::MergeIterState::PullNextBlock()
> @0x1056d5e62 kudu::MergeIterator::RefillHotHeap()
> @0x1056d4f0b kudu::MergeIterator::Init()
> @0x1006a413d kudu::tablet::Tablet::Iterator::Init()
> @0x1002cb3b9 
> kudu::tablet::DiffScanTest_TestDiffScanAfterDeltaFlush_Test::TestBody()
> @0x1005f1b88 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1005f1add testing::Test::Run()
> @0x1005f2dd0 testing::TestInfo::Run()
> @0x1005f3807 testing::TestSuite::Run()
> @0x100601b57 testing::internal::UnitTestImpl::RunAllTests()
> @0x100601418 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10060139c testing::UnitTest::Run()
> @0x100476201 RUN_ALL_TESTS()
> @0x100475fa8 main
> {code}
> The [crash 
> line|https://github.com/apache/kudu/blob/e574903ace741a531c49aba15f97e856ea80ca4b/src/kudu/tablet/delta_store.h#L149]
>  assumes that all deltas for a given row that have the same timestamp belong 
> in the same delta store, and it uses this assumption to order the deltas in a 
> diff scan.
> However, this is not true because, unlike the case for MRS flushes, we don't 
> wait for all ops to finish applying before flushing the DMS. This means that 
> a batch containing multiple updates to the same row may be spread across 
> multiple DMSs if we delta flush while the batch of updates is being applied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3291) Crash when performing a diff scan after delta flush races with a batch of ops that update the same row

2021-06-06 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3291:
--
Code Review: https://gerrit.cloudera.org/c/17547

> Crash when performing a diff scan after delta flush races with a batch of ops 
> that update the same row
> --
>
> Key: KUDU-3291
> URL: https://issues.apache.org/jira/browse/KUDU-3291
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.10.0, 1.10.1, 1.11.0, 1.12.0, 1.11.1, 1.13.0, 1.14.0
>Reporter: Andrew Wong
>Assignee: Andrew Wong
>Priority: Critical
>
> It's possible to run into the following crash:
> {code:java}
> F0604 23:20:50.032124 35483072 delta_store.h:153] Check failed: 
> a.delta_store_id == b.delta_store_id (4445773336 vs. 4445771896)
> *** Check failure stack trace: ***
> *** Aborted at 1622874050 (unix time) try "date -d @1622874050" if you are 
> using GNU date ***
> PC: @ 0x7fff724b033a __pthread_kill
> *** SIGABRT (@0x7fff724b033a) received by PID 69138 (TID 0x1021d6dc0) stack 
> trace: ***
> @ 0x7fff725615fd _sigtramp
> @ 0x7ffeef948568 (unknown)
> @ 0x7fff72437808 abort
> @0x107920599 google::logging_fail()
> @0x10791f4cf google::LogMessage::SendToLog()
> @0x10791fb95 google::LogMessage::Flush()
> @0x107923c9f google::LogMessageFatal::~LogMessageFatal()
> @0x107920b29 google::LogMessageFatal::~LogMessageFatal()
> @0x1009ae07e 
> kudu::tablet::SelectedDeltas::DeltaLessThanFunctor::operator()()
> @0x1009aa561 std::__1::max<>()
> @0x10099c740 kudu::tablet::SelectedDeltas::ProcessDelta()
> @0x10099e719 kudu::tablet::SelectedDeltas::MergeFrom()
> @0x1009a2b30 kudu::tablet::DeltaPreparer<>::SelectDeltas()
> @0x10094a545 kudu::tablet::DeltaFileIterator<>::SelectDeltas()
> @0x10098b10c kudu::tablet::DeltaIteratorMerger::SelectDeltas()
> @0x10097133f 
> kudu::tablet::DeltaApplier::InitializeSelectionVector()
> @0x1056df4fb kudu::MaterializingIterator::MaterializeBlock()
> @0x1056df2d8 kudu::MaterializingIterator::NextBlock()
> @0x1056d1c5b kudu::MergeIterState::PullNextBlock()
> @0x1056d5e62 kudu::MergeIterator::RefillHotHeap()
> @0x1056d4f0b kudu::MergeIterator::Init()
> @0x1006a413d kudu::tablet::Tablet::Iterator::Init()
> @0x1002cb3b9 
> kudu::tablet::DiffScanTest_TestDiffScanAfterDeltaFlush_Test::TestBody()
> @0x1005f1b88 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x1005f1add testing::Test::Run()
> @0x1005f2dd0 testing::TestInfo::Run()
> @0x1005f3807 testing::TestSuite::Run()
> @0x100601b57 testing::internal::UnitTestImpl::RunAllTests()
> @0x100601418 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @0x10060139c testing::UnitTest::Run()
> @0x100476201 RUN_ALL_TESTS()
> @0x100475fa8 main
> {code}
> The [crash 
> line|https://github.com/apache/kudu/blob/e574903ace741a531c49aba15f97e856ea80ca4b/src/kudu/tablet/delta_store.h#L149]
>  assumes that all deltas for a given row that have the same timestamp belong 
> in the same delta store, and it uses this assumption to order the deltas in a 
> diff scan.
> However, this is not true because, unlike the case for MRS flushes, we don't 
> wait for all ops to finish applying before flushing the DMS. This means that 
> a batch containing multiple updates to the same row may be spread across 
> multiple DMSs if we delta flush while the batch of updates is being applied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3291) Crash when performing a diff scan after delta flush races with a batch of ops that update the same row

2021-06-04 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3291:
-

 Summary: Crash when performing a diff scan after delta flush races 
with a batch of ops that update the same row
 Key: KUDU-3291
 URL: https://issues.apache.org/jira/browse/KUDU-3291
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.14.0, 1.13.0, 1.11.1, 1.12.0, 1.11.0, 1.10.1, 1.10.0
Reporter: Andrew Wong
Assignee: Andrew Wong


It's possible to run into the following crash:
{code:java}
F0604 23:20:50.032124 35483072 delta_store.h:153] Check failed: 
a.delta_store_id == b.delta_store_id (4445773336 vs. 4445771896)
*** Check failure stack trace: ***
*** Aborted at 1622874050 (unix time) try "date -d @1622874050" if you are 
using GNU date ***
PC: @ 0x7fff724b033a __pthread_kill
*** SIGABRT (@0x7fff724b033a) received by PID 69138 (TID 0x1021d6dc0) stack 
trace: ***
@ 0x7fff725615fd _sigtramp
@ 0x7ffeef948568 (unknown)
@ 0x7fff72437808 abort
@0x107920599 google::logging_fail()
@0x10791f4cf google::LogMessage::SendToLog()
@0x10791fb95 google::LogMessage::Flush()
@0x107923c9f google::LogMessageFatal::~LogMessageFatal()
@0x107920b29 google::LogMessageFatal::~LogMessageFatal()
@0x1009ae07e 
kudu::tablet::SelectedDeltas::DeltaLessThanFunctor::operator()()
@0x1009aa561 std::__1::max<>()
@0x10099c740 kudu::tablet::SelectedDeltas::ProcessDelta()
@0x10099e719 kudu::tablet::SelectedDeltas::MergeFrom()
@0x1009a2b30 kudu::tablet::DeltaPreparer<>::SelectDeltas()
@0x10094a545 kudu::tablet::DeltaFileIterator<>::SelectDeltas()
@0x10098b10c kudu::tablet::DeltaIteratorMerger::SelectDeltas()
@0x10097133f kudu::tablet::DeltaApplier::InitializeSelectionVector()
@0x1056df4fb kudu::MaterializingIterator::MaterializeBlock()
@0x1056df2d8 kudu::MaterializingIterator::NextBlock()
@0x1056d1c5b kudu::MergeIterState::PullNextBlock()
@0x1056d5e62 kudu::MergeIterator::RefillHotHeap()
@0x1056d4f0b kudu::MergeIterator::Init()
@0x1006a413d kudu::tablet::Tablet::Iterator::Init()
@0x1002cb3b9 
kudu::tablet::DiffScanTest_TestDiffScanAfterDeltaFlush_Test::TestBody()
@0x1005f1b88 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x1005f1add testing::Test::Run()
@0x1005f2dd0 testing::TestInfo::Run()
@0x1005f3807 testing::TestSuite::Run()
@0x100601b57 testing::internal::UnitTestImpl::RunAllTests()
@0x100601418 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x10060139c testing::UnitTest::Run()
@0x100476201 RUN_ALL_TESTS()
@0x100475fa8 main
{code}
The [crash 
line|https://github.com/apache/kudu/blob/e574903ace741a531c49aba15f97e856ea80ca4b/src/kudu/tablet/delta_store.h#L149]
 assumes that all deltas for a given row that have the same timestamp belong in 
the same delta store, and it uses this assumption to order the deltas in a diff 
scan.

However, this is not true because, unlike the case for MRS flushes, we don't 
wait for all ops to finish applying before flushing the DMS. This means that a 
batch containing multiple updates to the same row may be spread across multiple 
DMSs if we delta flush while the batch of updates is being applied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KUDU-3290) Implement Replicate table's data to Kafka(or other Storage System)

2021-06-03 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356981#comment-17356981
 ] 

Andrew Wong edited comment on KUDU-3290 at 6/3/21, 11:54 PM:
-

I'm curious what the use case is for the data stored in Kafka. Is it meant to 
be a physical backup of all ops that are sent to Kudu? A couple of other 
approaches come to mind that don't quite go as into the weeds into Kudu as 
what's outlined here, so I'm curious if they would fit your needs:
 * Periodically run a differential scan in Kudu, similar to an incremental 
backup, but storing the results in Kudu. Kudu's differential backup allows 
users to supply two (relatively recent) timestamps and Kudu will return the 
logical changes to the table that happened between those timestamps. This 
doesn't work that well if the goal is to get every individual mutation between 
the two timestamps, since diff scans summarize the changes between the two 
timestamps. However, if the process performing this ran frequently enough (e.g. 
every few seconds), getting a real-time replication may not be out of the 
picture. 
[Here's|https://github.com/apache/kudu/blob/master/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala#L78]
 an example of Spark using this kind of scan.
 * With whatever ingestion tool is currently writing to Kudu, also write to 
Kafka. Usually we see pipelines built with Spark or Nifi or Streamsets that 
ingest to Kudu – if you have such a pipeline, duplicating the pipeline into 
Kafka would be another relatively quick solution, though with the caveat that 
failures of either system may need to be worked around.

One concern I would have about the built-in Raft replication to an external 
system is the fact that Kudu Raft replicates all writes before applying the 
operations to the underlying data stores. This means that we may replicate 
operations that fail because a row already exists, and that is only caught 
after replication. Depending on how you are using the Kafka replica, I'm not 
sure the best way to handle this. Perhaps that's okay for your usage, but I can 
see it being confusing to replicate row operations that failed in Kudu.


was (Author: andrew.wong):
I'm curious what the use case is for the data stored in Kafka. Is it meant to 
be a physical backup of all ops that are sent to Kudu? A couple of other 
approaches come to mind that don't quite go as into the weeds into Kudu as 
what's outlined here, so I'm curious if they would fit your needs:
 * Periodically run a differential scan in Kudu, similar to an incremental 
backup, but storing the results in Kudu. Kudu's differential backup allows 
users to supply two (relatively recent) timestamps and Kudu will return the 
logical changes to the table that happened between those timestamps. This 
doesn't work that well if the goal is to get every individual mutation between 
the two timestamps, since diff scans summarize the changes between the two 
timestamps. However, if the process performing this ran frequently enough (e.g. 
every few seconds), getting a real-time replication may not be out of the 
picture. 
[Here's|https://github.com/apache/kudu/blob/master/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala#L78]
 an example of Spark using this kind of scan.
 * With whatever ingestion tool is currently writing to Kudu, also write to 
Kafka. Usually we see pipelines built with Spark or Nifi or Streamsets that 
ingest to Kudu – if you have such a pipeline, duplicating the pipeline into 
Kafka would be another relatively quick solution, though with the caveat that 
failures of either system may need to be worked around.

One concern I would have about the built-in replication to an external system 
is the fact that Kudu Raft replicates all writes before applying the operations 
to the underlying data stores. This means that we may replicate operations that 
fail because a row already exists, and that is only caught after replication. 
Depending on how you are using the Kafka replica, I'm not sure the best way to 
handle this. Perhaps that's okay for your usage, but I can see it being 
confusing to replicate row operations that failed in Kudu.

> Implement Replicate table's data to Kafka(or other Storage System)
> --
>
> Key: KUDU-3290
> URL: https://issues.apache.org/jira/browse/KUDU-3290
> Project: Kudu
>  Issue Type: New Feature
>  Components: tserver
>Reporter: shenxingwuying
>Priority: Critical
>
> h1. background & problem
> We use kudu to store the user profile data, because business requirements, 
> exchange and share data from multi-tenant users, which is reasonable in our 
> application scene, we need replicate data from one system to another. The

[jira] [Commented] (KUDU-3290) Implement Replicate table's data to Kafka(or other Storage System)

2021-06-03 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356981#comment-17356981
 ] 

Andrew Wong commented on KUDU-3290:
---

I'm curious what the use case is for the data stored in Kafka. Is it meant to 
be a physical backup of all ops that are sent to Kudu? A couple of other 
approaches come to mind that don't quite go as into the weeds into Kudu as 
what's outlined here, so I'm curious if they would fit your needs:
 * Periodically run a differential scan in Kudu, similar to an incremental 
backup, but storing the results in Kudu. Kudu's differential backup allows 
users to supply two (relatively recent) timestamps and Kudu will return the 
logical changes to the table that happened between those timestamps. This 
doesn't work that well if the goal is to get every individual mutation between 
the two timestamps, since diff scans summarize the changes between the two 
timestamps. However, if the process performing this ran frequently enough (e.g. 
every few seconds), getting a real-time replication may not be out of the 
picture. 
[Here's|https://github.com/apache/kudu/blob/master/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala#L78]
 an example of Spark using this kind of scan.
 * With whatever ingestion tool is currently writing to Kudu, also write to 
Kafka. Usually we see pipelines built with Spark or Nifi or Streamsets that 
ingest to Kudu – if you have such a pipeline, duplicating the pipeline into 
Kafka would be another relatively quick solution, though with the caveat that 
failures of either system may need to be worked around.

One concern I would have about the built-in replication to an external system 
is the fact that Kudu Raft replicates all writes before applying the operations 
to the underlying data stores. This means that we may replicate operations that 
fail because a row already exists, and that is only caught after replication. 
Depending on how you are using the Kafka replica, I'm not sure the best way to 
handle this. Perhaps that's okay for your usage, but I can see it being 
confusing to replicate row operations that failed in Kudu.

> Implement Replicate table's data to Kafka(or other Storage System)
> --
>
> Key: KUDU-3290
> URL: https://issues.apache.org/jira/browse/KUDU-3290
> Project: Kudu
>  Issue Type: New Feature
>  Components: tserver
>Reporter: shenxingwuying
>Priority: Critical
>
> h1. background & problem
> We use kudu to store the user profile data, because business requirements, 
> exchange and share data from multi-tenant users, which is reasonable in our 
> application scene, we need replicate data from one system to another. The 
> destination storage system we pick kafka, because of our company's 
> architecture at now.
> At this time, we have two ideas to solve it.
> h1. two replication scheme
> Generally, Raft group has three replicas, one is leader and the other two are 
> followers. We’ll add a replica, its role is Learner. Learner only receive all 
> the data, but not pariticipart in ther leadership election.
> The leaner replica, its state machine will be a plugin system, eg:
>  # We can support KuduEngine, which just a data backup like mongodb’s hidden 
> replica.
>  # We can write to the thirdparty store system, like kafka or any other 
> system we need. Then we can replicate data to another system use its client.
> At Paxos has a learner role, which only receive data. we need such a role for 
> new membership.
> But it Kudu Learner has been used for the copying(recovering) tablet replica. 
> Maybe we need a new role name, at this, we still use Learner to represent the 
> new role. (We should think over new role name)
> In our application scene, we will replicate data to kafka, and I will explain 
> the method.
> h2. Learner replication
>  # Add a new replica role, maybe we call it learner, because Paxos has a 
> learner role, which only receive data. We need such a role for new 
> membership. But at Kudu Learner has been used for the copying(recovering) 
> tablet replica. Maybe we need a new role name, at this, we still use Learner 
> to represent the new role. (We should think over new role name)
>  # The voters's safepoint of clean obsoleted wal is min(leader’ max wal 
> sequence number, followers max wal sequence number, learner’ max wal sequence 
> number)
>  # The learner not voter, not partitipant in elections
>  # Raft can replication data to the learner
>  # The process of learner applydb, just like raft followers, the logs before 
> committed index will replicate to kafka, kafka’s response ok. the apply index 
> will increase.
>  # We need kafka client, it will be added to kudu as an option, maybe as an 
> compile option
>  # When a kudu-tserver decomi

[jira] [Resolved] (KUDU-3288) tserver segfault when processing DeleteTablet

2021-05-27 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3288.
---
Fix Version/s: 1.15.0
   Resolution: Duplicate

This is likely a duplicate of KUDU-3268, which is fixed in the upcoming release 
(1.15.0).

> tserver segfault when processing DeleteTablet
> -
>
> Key: KUDU-3288
> URL: https://issues.apache.org/jira/browse/KUDU-3288
> Project: Kudu
>  Issue Type: Bug
>  Components: tserver
>Affects Versions: 1.14.0
>Reporter: mintao
>Priority: Major
> Fix For: 1.15.0
>
>
> In the core dump, the stack:
> {code:java}
> #0  0x0251e403 in 
> kudu::MaintenanceManager::LaunchOp(kudu::MaintenanceOp*) () at 
> /opt/kudu/kudu/src/kudu/util/maintenance_manager.cc:551
> #1  0x0257c98e in operator() (this=0x7f4425076af0) at 
> /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
> #2  kudu::ThreadPool::DispatchThread() () at 
> /opt/kudu/kudu/src/kudu/util/threadpool.cc:662
> #3  0x02573e25 in operator() (this=0x6f86fe8) at 
> /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
> #4  kudu::Thread::SuperviseThread(void*) () at 
> /opt/kudu/kudu/src/kudu/util/thread.cc:674
> #5  0x7f442c9bfe25 in start_thread () from /lib64/libpthread.so.0
> #6  0x7f442ac95bad in clone () from /lib64/libc.so.6
> {code}
> The local variables :
> {code:java}
> thread_id = 164113
> op_instance = {thread_id = 164113,
>   name = 
> "CompactRowSetsOp(2c61e21e2e0b4caba1736b5c248dd65e)\000\000\000\000\000\000\350\270\344\002",
>  '\000' , 
> "W\000\000\000\345\005\000\000\250ǻ>\001\000\000\000\260ߣi\000\000\000\000\000\030\323\347",
>  '\000' , 
> "P\344\033\313\033\063\\\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\064-><.h
>  8f6Do!^#=12=?( , "onse\001\377\a\000\000\000\000\000\000\000\000"..., 
> duration = {static kUninitialized = -9223372036854775808,
> nano_delta_ = -9223372036854775808}, start_mono_time = {static 
> kNanosecondsPerSecond = 10, static kNanosecondsPerMillisecond = 
> 100,
> static kNanosecondsPerMicrosecond = 1000, static kMicrosecondsPerSecond = 
> 100, nanos_ = 32139819439241529}}
> scoped_cleanup_L582 = 
> trace = 
> sw = 
> {code}
> In the Tablet server's log, saw this:
> {code:java}
> I0526 09:47:39.229526 86465 tablet_replica.cc:291] T 
> 2c61e21e2e0b4caba1736b5c248dd65e P c12ad54315b24a61b8c47ccd7a3ddf7e: stopping 
> tablet replica
> I0526 09:47:39.230662 86464 ts_tablet_manager.cc:1552] T 
> 02e056b7c982476db5bd5249f7806cbd P c12ad54315b24a61b8c47ccd7a3ddf7e: Deleting 
> tablet data with delete state TABLET_DATA_DELETED
> I0526 09:47:39.234947 164344 maintenance_manager.cc:373] P 
> c12ad54315b24a61b8c47ccd7a3ddf7e: Scheduling 
> CompactRowSetsOp(2c61e21e2e0b4caba1736b5c248dd65e): perf score=0.012862
> I0526 09:47:39.234983 86465 raft_consensus.cc:2226] T 
> 2c61e21e2e0b4caba1736b5c248dd65e P c12ad54315b24a61b8c47ccd7a3ddf7e [term 1 
> FOLLOWER]: Raft consensus shutting down.
> I0526 09:47:39.235006 86465 raft_consensus.cc:2255] T 
> 2c61e21e2e0b4caba1736b5c248dd65e P c12ad54315b24a61b8c47ccd7a3ddf7e [term 1 
> FOLLOWER]: Raft consensus is shut down!
> {code}
> Tablet server tried to perform RowSet Compacting on a Deleting tablet.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-3258) Expose some kind of transaction dashboard in ksck or the web UI

2021-04-19 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-3258:
-

Assignee: Andrew Wong

> Expose some kind of transaction dashboard in ksck or the web UI
> ---
>
> Key: KUDU-3258
> URL: https://issues.apache.org/jira/browse/KUDU-3258
> Project: Kudu
>  Issue Type: Improvement
>  Components: ops-tooling, transactions
>Reporter: Andrew Wong
>Assignee: Andrew Wong
>Priority: Major
>
> It would be useful to expose the locations and tablet IDs of the 
> TxnStatusManager replicas, and even show the health of them from unified 
> front, whether that's the web UI, ksck, or both. Some useful things to know 
> about:
>  - The tablet ID, range, and location of each TxnStatusManager partition
>  - The highest transaction ID per TxnStatusManager partition
>  - In-flight (not COMMITTED or ABORTED) transactions and their current state, 
> though would also be nice to filter specific states
>  - Commit timestamp (and other relevant timestamps, if available, reported 
> with physical and logical portions)
>  - We could also consider storing the transaction creation time in the same 
> way that we have a "time created" for tables in the masters
> After some discussion with Alexey, we think it'd be more useful to focus on:
>  * having a separate section in ksck to display the health of the transaction 
> status table
>  * having a separate tool to focus on displaying the business logic of the 
> TxnStatusManager partitions (not the web UI, for now)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3271) Tablet server crashed when handle scan request

2021-04-08 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3271.
---
Fix Version/s: 1.13.0
   Resolution: Fixed

I checked out the commit before 163cd25 and copied over the test in the patch. 
After running it a couple times, I ran into:
{code:java}
I0408 22:49:44.993857 54213 ts_tablet_manager.cc:1144] T 
 P dbfd161726d64fa0b01e8a9237fb37d1: Time spent 
starting tablet: real 0.004s   user 0.002s sys 0.002s
I0408 22:49:44.993940 54215 raft_consensus.cc:683] T 
 P dbfd161726d64fa0b01e8a9237fb37d1 [term 1 
LEADER]: Becoming Leader. State: Replica: dbfd161726d64fa0b01e8a9237fb37d1, 
State: Running, Role: LEADER
W0408 22:49:44.993994 54151 reactor.cc:681] Failed to create an outbound 
connection to 255.255.255.255:1 because connect() failed: Network error: 
connect(2) error: Network is unreachable (error 101)
I0408 22:49:44.994019 54215 consensus_queue.cc:227] T 
 P dbfd161726d64fa0b01e8a9237fb37d1 [LEADER]: 
Queue going to LEADER mode. State: All replicated index: 0, Majority replicated 
index: 0, Committed index: 0, Last appended: 0.0, Last appended by leader: 0, 
Current term: 1, Majority size: 1, State: 0, Mode: LEADER, active raft config: 
opid_index: -1 peers { permanent_uuid: "dbfd161726d64fa0b01e8a9237fb37d1" 
member_type: VOTER last_known_addr { host: "127.0.0.1" port: 44157 } }
*** Aborted at 1617947385 (unix time) try "date -d @1617947385" if you are 
using GNU date ***
I0408 22:49:45.024998 54168 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=100, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025013 54167 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=100, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025015 54166 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=101, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025023 54163 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=100, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025087 54167 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=101, remote={username='awong'} at 127.0.0.1:60548
I0408 22:49:45.025157 54167 tablet_service.cc:2747] Scan: Not found: Scanner 
c1c38a30a9cd480b8affda8b74c7872a not found (it may have expired): call sequence 
id=100, remote={username='awong'} at 127.0.0.1:60548
PC: @  0x229eed3 kudu::UnionIterator::HasNext()
*** SIGSEGV (@0x0) received by PID 54140 (TID 0x7fa30cfde700) from PID 0; stack 
trace: ***
@ 0x7fa31d2b9370 (unknown)
@  0x229eed3 kudu::UnionIterator::HasNext()
@   0xb3300c 
kudu::tserver::TabletServiceImpl::HandleContinueScanRequest()
@   0xb45a09 kudu::tserver::TabletServiceImpl::Scan()
@  0x2227b79 kudu::rpc::GeneratedServiceIf::Handle()
@  0x2228839 kudu::rpc::ServicePool::RunThread()
@  0x23af01f kudu::Thread::SuperviseThread()
@ 0x7fa31d2b1dc5 start_thread
@ 0x7fa31b60976d __clone
Segmentation fault {code}
So I think it's safe to say this was indeed addressed by Todd's locking commit.

[~zhangyifan27] If you're able, feel free to pull 163cd25 into your version of 
Kudu to prevent this in the future, or consider upgrading to 1.13 or higher.

> Tablet server crashed when handle scan request
> --
>
> Key: KUDU-3271
> URL: https://issues.apache.org/jira/browse/KUDU-3271
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: YifanZhang
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: tablet-52a743.log
>
>
> We found that one of kudu tablet server crashed when handle scan request. The 
> scanned table didn't have any row operations at that time. This issue only 
> came up once so far.
> Coredump stack is:
> {code:java}
> Program terminated with signal 11, Segmentation fault.
> (gdb) bt
> #0  kudu::tablet::DeltaApplier::HasNext (this=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84
> #1  0x02185900 in kudu::UnionIterator::HasNext (this=) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051
> #2  0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner 
> (this=0x4fea140, scanner_id=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195
> #3  0x009e

[jira] [Commented] (KUDU-3271) Tablet server crashed when handle scan request

2021-04-08 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317570#comment-17317570
 ] 

Andrew Wong commented on KUDU-3271:
---

I suspect this may have been fixed by 
[163cd25|https://github.com/apache/kudu/commit/163cd25] which landed in 1.13. 
I'll try checking out the code prior to that change and seeing if this is 
reproducible.

> Tablet server crashed when handle scan request
> --
>
> Key: KUDU-3271
> URL: https://issues.apache.org/jira/browse/KUDU-3271
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: YifanZhang
>Priority: Major
> Attachments: tablet-52a743.log
>
>
> We found that one of kudu tablet server crashed when handle scan request. The 
> scanned table didn't have any row operations at that time. This issue only 
> came up once so far.
> Coredump stack is:
> {code:java}
> Program terminated with signal 11, Segmentation fault.
> (gdb) bt
> #0  kudu::tablet::DeltaApplier::HasNext (this=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84
> #1  0x02185900 in kudu::UnionIterator::HasNext (this=) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051
> #2  0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner 
> (this=0x4fea140, scanner_id=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195
> #3  0x009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, 
> __in_chrg=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179
> #4  kudu::tserver::TabletServiceImpl::HandleContinueScanRequest 
> (this=this@entry=0x60edef0, req=req@entry=0x9582e880, 
> rpc_context=rpc_context@entry=0x8151d7800,     
> result_collector=result_collector@entry=0x7f2d721679f0, 
> has_more_results=has_more_results@entry=0x7f2d721678f9, 
> error_code=error_code@entry=0x7f2d721678fc)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737
> #5  0x009fb009 in kudu::tserver::TabletServiceImpl::Scan 
> (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907
> #6  0x0210f019 in operator() (__args#2=0x8151d7800, 
> __args#1=0xb87b16de0, __args#0=, this=0x4e0c7708) at 
> /usr/include/c++/4.8.2/functional:2471
> #7  kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
> #8  0x0210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
> #9  0x0228ecaf in operator() (this=0xc1a58c28) at 
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 
> 0x7f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 
> 0x7f2de4e6873d in clone () from /lib64/libc.so.6
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3268) Crash in TabletServerDiskErrorTest.TestRandomOpSequence

2021-04-08 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3268.
---
Fix Version/s: 1.15.0
   Resolution: Fixed

> Crash in TabletServerDiskErrorTest.TestRandomOpSequence
> ---
>
> Key: KUDU-3268
> URL: https://issues.apache.org/jira/browse/KUDU-3268
> Project: Kudu
>  Issue Type: Bug
>  Components: test, tserver
>Reporter: Andrew Wong
>Priority: Major
> Fix For: 1.15.0
>
> Attachments: tablet_server-test.3.txt.gz
>
>
> A pre-commit failed with the following crash when attempting to launch an op 
> after stopping a replica:
> {code:java}
> I0323 18:15:01.078991 23854 maintenance_manager.cc:373] P 
> c8a93089db0041f5930b9fb1832714ed: Scheduling 
> CompactRowSetsOp(): perf score=1.012452
> I0323 18:15:01.079111 21067 tablet_server-test.cc:852] Tablet server 
> responded with: timestamp: 6621279441214984192
> I0323 18:15:01.079317 23789 maintenance_manager.cc:594] P 
> c8a93089db0041f5930b9fb1832714ed: 
> UndoDeltaBlockGCOp() complete. Timing: real 
> 0.000suser 0.000s sys 0.000s Metrics: 
> {"cfile_init":1,"lbm_read_time_us":73,"lbm_reads_lt_1ms":4}
> E0323 18:15:01.080865 23788 cfile_reader.cc:591] Encountered corrupted CFile 
> in filesystem block: 4124746176525068430
> I0323 18:15:01.080960 23788 ts_tablet_manager.cc:1774] T 
>  P c8a93089db0041f5930b9fb1832714ed: failing 
> tablet
> I0323 18:15:01.080950 21067 tablet_server-test.cc:852] Tablet server 
> responded with: timestamp: 6621279441223315456
> I0323 18:15:01.081243 24138 tablet_replica.cc:324] T 
>  P c8a93089db0041f5930b9fb1832714ed: stopping 
> tablet replica
> I0323 18:15:01.081670 21067 tablet_server-test.cc:852] Tablet server 
> responded with: error {
>   code: TABLET_NOT_RUNNING
>   status {
> code: ILLEGAL_STATE
> message: "Tablet not RUNNING: STOPPING"
>   }
> }
> I0323 18:15:01.081777 21067 tablet_server-test.cc:890] Failure was caught by 
> an op!
> W0323 18:15:01.082907 23788 tablet_mm_ops.cc:176] T 
>  P c8a93089db0041f5930b9fb1832714ed: 
> Compaction failed on : Corruption: Flush to 
> disk failed: checksum error on CFile block 4124746176525068430 at offset=1006 
> size=24: Checksum does not match: 3582029077 vs expected 3582029077
> I0323 18:15:01.082957 23788 maintenance_manager.cc:594] P 
> c8a93089db0041f5930b9fb1832714ed: 
> CompactRowSetsOp() complete. Timing: real 
> 0.004s  user 0.003s sys 0.000s Metrics: 
> {"cfile_cache_miss":3,"cfile_cache_miss_bytes":92,"delta_iterators_relevant":2,"dirs.queue_time_us":630,"dirs.run_cpu_time_us":368,"dirs.run_wall_time_us":2220,"lbm_read_time_us":54,"lbm_reads_lt_1ms":3,"lbm_write_time_us":168,"lbm_writes_lt_1ms":6,"num_input_rowsets":2,"spinlock_wait_cycles":1792,"tablet-open.queue_time_us":135,"thread_start_us":382,"threads_started":5}
> I0323 18:15:01.083369 23854 maintenance_manager.cc:373] P 
> c8a93089db0041f5930b9fb1832714ed: Scheduling 
> CompactRowSetsOp(): perf score=1.012452
> *** Aborted at 1616523301 (unix time) try "date -d @1616523301" if you are 
> using GNU date ***
> I0323 18:15:01.083519 24138 raft_consensus.cc:2226] T 
>  P c8a93089db0041f5930b9fb1832714ed [term 1 
> LEADER]: Raft consensus shutting down.
> I0323 18:15:01.083653 24138 raft_consensus.cc:2255] T 
>  P c8a93089db0041f5930b9fb1832714ed [term 1 
> FOLLOWER]: Raft consensus is shut down!
> I0323 18:15:01.085090 21067 tablet_server-test.cc:894] Tablet was 
> successfully failed
> I0323 18:15:01.085439 21067 tablet_server.cc:166] TabletServer@127.0.0.1:0 
> shutting down...
> PC: @ 0x7ff97596de0d kudu::MaintenanceManager::LaunchOp()
> *** SIGSEGV (@0x30) received by PID 21067 (TID 0x7ff96343b700) from PID 48; 
> stack trace: ***
> @ 0x7ff976846980 (unknown) at ??:0
> @ 0x7ff97596de0d kudu::MaintenanceManager::LaunchOp() at ??:0
> @ 0x7ff97596b538 
> _ZZN4kudu18MaintenanceManager18RunSchedulerThreadEvENKUlvE_clEv at ??:0
> @ 0x7ff97596f124 
> _ZNSt17_Function_handlerIFvvEZN4kudu18MaintenanceManager18RunSchedulerThreadEvEUlvE_E9_M_invokeERKSt9_Any_data
>  at ??:0
> @ 0x7ff977e2bcf4 std::function<>::operator()() at ??:0
> @ 0x7ff975a05e6e kudu::ThreadPool::DispatchThread() at ??:0
> @ 0x7ff975a06757 _ZZN4kudu10ThreadPool12CreateThreadEvENKUlvE_clEv at 
> ??:0
> @ 0x7ff975a07e7b 
> _ZNSt17_Function_handlerIFvvEZN4kudu10ThreadPool12CreateThreadEvEUlvE_E9_M_invokeERKSt9_Any_data
>  at ??:0
> @ 0x7ff977e2bcf4 std::function

[jira] [Comment Edited] (KUDU-3271) Tablet server crashed when handle scan request

2021-04-02 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314068#comment-17314068
 ] 

Andrew Wong edited comment on KUDU-3271 at 4/2/21, 8:20 PM:


[~zhangyifan27] Thanks for reporting this. Is there anything in the INFO logs 
that you think might be useful in getting to the bottom of this? Do you know 
what scans were running around this time? Were there any special or new 
workloads running that hadn't run before?


was (Author: andrew.wong):
[~zhangyifan27] Thanks for reporting this. Is there anything in the INFO logs 
that you think might be useful in getting to the bottom of this?

> Tablet server crashed when handle scan request
> --
>
> Key: KUDU-3271
> URL: https://issues.apache.org/jira/browse/KUDU-3271
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: YifanZhang
>Priority: Major
>
> We found that one of kudu tablet server crashed when handle scan request. The 
> scanned table didn't have any row operations at that time. This issue only 
> came up once so far.
> Coredump stack is:
> {code:java}
> Program terminated with signal 11, Segmentation fault.
> (gdb) bt
> #0  kudu::tablet::DeltaApplier::HasNext (this=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84
> #1  0x02185900 in kudu::UnionIterator::HasNext (this=) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051
> #2  0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner 
> (this=0x4fea140, scanner_id=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195
> #3  0x009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, 
> __in_chrg=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179
> #4  kudu::tserver::TabletServiceImpl::HandleContinueScanRequest 
> (this=this@entry=0x60edef0, req=req@entry=0x9582e880, 
> rpc_context=rpc_context@entry=0x8151d7800,     
> result_collector=result_collector@entry=0x7f2d721679f0, 
> has_more_results=has_more_results@entry=0x7f2d721678f9, 
> error_code=error_code@entry=0x7f2d721678fc)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737
> #5  0x009fb009 in kudu::tserver::TabletServiceImpl::Scan 
> (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907
> #6  0x0210f019 in operator() (__args#2=0x8151d7800, 
> __args#1=0xb87b16de0, __args#0=, this=0x4e0c7708) at 
> /usr/include/c++/4.8.2/functional:2471
> #7  kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
> #8  0x0210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
> #9  0x0228ecaf in operator() (this=0xc1a58c28) at 
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 
> 0x7f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 
> 0x7f2de4e6873d in clone () from /lib64/libc.so.6
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3271) Tablet server crashed when handle scan request

2021-04-02 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314068#comment-17314068
 ] 

Andrew Wong commented on KUDU-3271:
---

[~zhangyifan27] Thanks for reporting this. Is there anything in the INFO logs 
that you think might be useful in getting to the bottom of this?

> Tablet server crashed when handle scan request
> --
>
> Key: KUDU-3271
> URL: https://issues.apache.org/jira/browse/KUDU-3271
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: YifanZhang
>Priority: Major
>
> We found that one of kudu tablet server crashed when handle scan request. The 
> scanned table didn't have any row operations at that time. This issue only 
> came up once so far.
> Coredump stack is:
> {code:java}
> Program terminated with signal 11, Segmentation fault.
> (gdb) bt
> #0  kudu::tablet::DeltaApplier::HasNext (this=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/delta_applier.cc:84
> #1  0x02185900 in kudu::UnionIterator::HasNext (this=) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:1051
> #2  0x00a2ea8f in kudu::tserver::ScannerManager::UnregisterScanner 
> (this=0x4fea140, scanner_id=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.cc:195
> #3  0x009e7adf in ~ScopedUnregisterScanner (this=0x7f2d72167610, 
> __in_chrg=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/scanners.h:179
> #4  kudu::tserver::TabletServiceImpl::HandleContinueScanRequest 
> (this=this@entry=0x60edef0, req=req@entry=0x9582e880, 
> rpc_context=rpc_context@entry=0x8151d7800,     
> result_collector=result_collector@entry=0x7f2d721679f0, 
> has_more_results=has_more_results@entry=0x7f2d721678f9, 
> error_code=error_code@entry=0x7f2d721678fc)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2737
> #5  0x009fb009 in kudu::tserver::TabletServiceImpl::Scan 
> (this=0x60edef0, req=0x9582e880, resp=0xb87b16de0, context=0x8151d7800)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1907
> #6  0x0210f019 in operator() (__args#2=0x8151d7800, 
> __args#1=0xb87b16de0, __args#0=, this=0x4e0c7708) at 
> /usr/include/c++/4.8.2/functional:2471
> #7  kudu::rpc::GeneratedServiceIf::Handle (this=0x60edef0, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.cc:139
> #8  0x0210fcd9 in kudu::rpc::ServicePool::RunThread (this=0x50fb9e0) 
> at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_pool.cc:225
> #9  0x0228ecaf in operator() (this=0xc1a58c28) at 
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::Thread::SuperviseThread (arg=0xc1a58c00) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/util/thread.cc:674#11 
> 0x7f2de6b8adc5 in start_thread () from /lib64/libpthread.so.0#12 
> 0x7f2de4e6873d in clone () from /lib64/libc.so.6
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3268) Crash in TabletServerDiskErrorTest.TestRandomOpSequence

2021-03-23 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3268:
--
Description: 
A pre-commit failed with the following crash when attempting to launch an op 
after stopping a replica:
{code:java}
I0323 18:15:01.078991 23854 maintenance_manager.cc:373] P 
c8a93089db0041f5930b9fb1832714ed: Scheduling 
CompactRowSetsOp(): perf score=1.012452
I0323 18:15:01.079111 21067 tablet_server-test.cc:852] Tablet server responded 
with: timestamp: 6621279441214984192
I0323 18:15:01.079317 23789 maintenance_manager.cc:594] P 
c8a93089db0041f5930b9fb1832714ed: 
UndoDeltaBlockGCOp() complete. Timing: real 
0.000s  user 0.000s sys 0.000s Metrics: 
{"cfile_init":1,"lbm_read_time_us":73,"lbm_reads_lt_1ms":4}
E0323 18:15:01.080865 23788 cfile_reader.cc:591] Encountered corrupted CFile in 
filesystem block: 4124746176525068430
I0323 18:15:01.080960 23788 ts_tablet_manager.cc:1774] T 
 P c8a93089db0041f5930b9fb1832714ed: failing 
tablet
I0323 18:15:01.080950 21067 tablet_server-test.cc:852] Tablet server responded 
with: timestamp: 6621279441223315456
I0323 18:15:01.081243 24138 tablet_replica.cc:324] T 
 P c8a93089db0041f5930b9fb1832714ed: stopping 
tablet replica
I0323 18:15:01.081670 21067 tablet_server-test.cc:852] Tablet server responded 
with: error {
  code: TABLET_NOT_RUNNING
  status {
code: ILLEGAL_STATE
message: "Tablet not RUNNING: STOPPING"
  }
}
I0323 18:15:01.081777 21067 tablet_server-test.cc:890] Failure was caught by an 
op!
W0323 18:15:01.082907 23788 tablet_mm_ops.cc:176] T 
 P c8a93089db0041f5930b9fb1832714ed: Compaction 
failed on : Corruption: Flush to disk failed: 
checksum error on CFile block 4124746176525068430 at offset=1006 size=24: 
Checksum does not match: 3582029077 vs expected 3582029077
I0323 18:15:01.082957 23788 maintenance_manager.cc:594] P 
c8a93089db0041f5930b9fb1832714ed: 
CompactRowSetsOp() complete. Timing: real 
0.004suser 0.003s sys 0.000s Metrics: 
{"cfile_cache_miss":3,"cfile_cache_miss_bytes":92,"delta_iterators_relevant":2,"dirs.queue_time_us":630,"dirs.run_cpu_time_us":368,"dirs.run_wall_time_us":2220,"lbm_read_time_us":54,"lbm_reads_lt_1ms":3,"lbm_write_time_us":168,"lbm_writes_lt_1ms":6,"num_input_rowsets":2,"spinlock_wait_cycles":1792,"tablet-open.queue_time_us":135,"thread_start_us":382,"threads_started":5}
I0323 18:15:01.083369 23854 maintenance_manager.cc:373] P 
c8a93089db0041f5930b9fb1832714ed: Scheduling 
CompactRowSetsOp(): perf score=1.012452
*** Aborted at 1616523301 (unix time) try "date -d @1616523301" if you are 
using GNU date ***
I0323 18:15:01.083519 24138 raft_consensus.cc:2226] T 
 P c8a93089db0041f5930b9fb1832714ed [term 1 
LEADER]: Raft consensus shutting down.
I0323 18:15:01.083653 24138 raft_consensus.cc:2255] T 
 P c8a93089db0041f5930b9fb1832714ed [term 1 
FOLLOWER]: Raft consensus is shut down!
I0323 18:15:01.085090 21067 tablet_server-test.cc:894] Tablet was successfully 
failed
I0323 18:15:01.085439 21067 tablet_server.cc:166] TabletServer@127.0.0.1:0 
shutting down...
PC: @ 0x7ff97596de0d kudu::MaintenanceManager::LaunchOp()
*** SIGSEGV (@0x30) received by PID 21067 (TID 0x7ff96343b700) from PID 48; 
stack trace: ***
@ 0x7ff976846980 (unknown) at ??:0
@ 0x7ff97596de0d kudu::MaintenanceManager::LaunchOp() at ??:0
@ 0x7ff97596b538 
_ZZN4kudu18MaintenanceManager18RunSchedulerThreadEvENKUlvE_clEv at ??:0
@ 0x7ff97596f124 
_ZNSt17_Function_handlerIFvvEZN4kudu18MaintenanceManager18RunSchedulerThreadEvEUlvE_E9_M_invokeERKSt9_Any_data
 at ??:0
@ 0x7ff977e2bcf4 std::function<>::operator()() at ??:0
@ 0x7ff975a05e6e kudu::ThreadPool::DispatchThread() at ??:0
@ 0x7ff975a06757 _ZZN4kudu10ThreadPool12CreateThreadEvENKUlvE_clEv at 
??:0
@ 0x7ff975a07e7b 
_ZNSt17_Function_handlerIFvvEZN4kudu10ThreadPool12CreateThreadEvEUlvE_E9_M_invokeERKSt9_Any_data
 at ??:0
@ 0x7ff977e2bcf4 std::function<>::operator()() at ??:0
@ 0x7ff9759f8913 kudu::Thread::SuperviseThread() at ??:0
@ 0x7ff97683b6db start_thread at ??:0
@ 0x7ff97388e71f clone at ??:0
{code}

Attached the full logs -- seems there's something unsafe about how we 
unregister ops (maybe from the fix for KUDU-3149?) when racing with the 
scheduler thread.


  was:
A pre-commit failed with the following crash when attempting to launch an op 
after stopping a replica:
{code:java}
I0323 18:15:01.079317 23789 maintenance_manager.cc:594] P 
c8a93089db0041f5930b9fb1832714ed: 
UndoDeltaBlockGCOp() complete. Timing: real 
0.000s user 0.00

[jira] [Created] (KUDU-3268) Crash in TabletServerDiskErrorTest.TestRandomOpSequence

2021-03-23 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3268:
-

 Summary: Crash in TabletServerDiskErrorTest.TestRandomOpSequence
 Key: KUDU-3268
 URL: https://issues.apache.org/jira/browse/KUDU-3268
 Project: Kudu
  Issue Type: Bug
  Components: test, tserver
Reporter: Andrew Wong
 Attachments: tablet_server-test.3.txt.gz

A pre-commit failed with the following crash when attempting to launch an op 
after stopping a replica:
{code:java}
I0323 18:15:01.079317 23789 maintenance_manager.cc:594] P 
c8a93089db0041f5930b9fb1832714ed: 
UndoDeltaBlockGCOp() complete. Timing: real 
0.000s user 0.000s sys 0.000s Metrics: 
{"cfile_init":1,"lbm_read_time_us":73,"lbm_reads_lt_1ms":4}I0323 
18:15:01.079317 23789 maintenance_manager.cc:594] P 
c8a93089db0041f5930b9fb1832714ed: 
UndoDeltaBlockGCOp() complete. Timing: real 
0.000s user 0.000s sys 0.000s Metrics: 
{"cfile_init":1,"lbm_read_time_us":73,"lbm_reads_lt_1ms":4}E0323 
18:15:01.080865 23788 cfile_reader.cc:591] Encountered corrupted CFile in 
filesystem block: 4124746176525068430I0323 18:15:01.080960 23788 
ts_tablet_manager.cc:1774] T  P 
c8a93089db0041f5930b9fb1832714ed: failing tabletI0323 18:15:01.080950 21067 
tablet_server-test.cc:852] Tablet server responded with: timestamp: 
6621279441223315456I0323 18:15:01.081243 24138 tablet_replica.cc:324] T 
 P c8a93089db0041f5930b9fb1832714ed: stopping 
tablet replicaI0323 18:15:01.081670 21067 tablet_server-test.cc:852] Tablet 
server responded with: error {  code: TABLET_NOT_RUNNING  status {    code: 
ILLEGAL_STATE    message: "Tablet not RUNNING: STOPPING"  }}I0323 
18:15:01.081777 21067 tablet_server-test.cc:890] Failure was caught by an 
op!W0323 18:15:01.082907 23788 tablet_mm_ops.cc:176] T 
 P c8a93089db0041f5930b9fb1832714ed: Compaction 
failed on : Corruption: Flush to disk failed: 
checksum error on CFile block 4124746176525068430 at offset=1006 size=24: 
Checksum does not match: 3582029077 vs expected 3582029077I0323 18:15:01.082957 
23788 maintenance_manager.cc:594] P c8a93089db0041f5930b9fb1832714ed: 
CompactRowSetsOp() complete. Timing: real 
0.004s user 0.003s sys 0.000s Metrics: 
{"cfile_cache_miss":3,"cfile_cache_miss_bytes":92,"delta_iterators_relevant":2,"dirs.queue_time_us":630,"dirs.run_cpu_time_us":368,"dirs.run_wall_time_us":2220,"lbm_read_time_us":54,"lbm_reads_lt_1ms":3,"lbm_write_time_us":168,"lbm_writes_lt_1ms":6,"num_input_rowsets":2,"spinlock_wait_cycles":1792,"tablet-open.queue_time_us":135,"thread_start_us":382,"threads_started":5}I0323
 18:15:01.083369 23854 maintenance_manager.cc:373] P 
c8a93089db0041f5930b9fb1832714ed: Scheduling 
CompactRowSetsOp(): perf score=1.012452*** 
Aborted at 1616523301 (unix time) try "date -d @1616523301" if you are using 
GNU date ***I0323 18:15:01.083519 24138 raft_consensus.cc:2226] T 
 P c8a93089db0041f5930b9fb1832714ed [term 1 
LEADER]: Raft consensus shutting down.I0323 18:15:01.083653 24138 
raft_consensus.cc:2255] T  P 
c8a93089db0041f5930b9fb1832714ed [term 1 FOLLOWER]: Raft consensus is shut 
down!I0323 18:15:01.085090 21067 tablet_server-test.cc:894] Tablet was 
successfully failedI0323 18:15:01.085439 21067 tablet_server.cc:166] 
TabletServer@127.0.0.1:0 shutting down...PC: @     0x7ff97596de0d 
kudu::MaintenanceManager::LaunchOp()*** SIGSEGV (@0x30) received by PID 21067 
(TID 0x7ff96343b700) from PID 48; stack trace: ***    @     0x7ff976846980 
(unknown) at ??:0    @     0x7ff97596de0d kudu::MaintenanceManager::LaunchOp() 
at ??:0    @     0x7ff97596b538 
_ZZN4kudu18MaintenanceManager18RunSchedulerThreadEvENKUlvE_clEv at ??:0    @    
 0x7ff97596f124 
_ZNSt17_Function_handlerIFvvEZN4kudu18MaintenanceManager18RunSchedulerThreadEvEUlvE_E9_M_invokeERKSt9_Any_data
 at ??:0    @     0x7ff977e2bcf4 std::function<>::operator()() at ??:0    @     
0x7ff975a05e6e kudu::ThreadPool::DispatchThread() at ??:0    @     
0x7ff975a06757 _ZZN4kudu10ThreadPool12CreateThreadEvENKUlvE_clEv at ??:0    @   
  0x7ff975a07e7b 
_ZNSt17_Function_handlerIFvvEZN4kudu10ThreadPool12CreateThreadEvEUlvE_E9_M_invokeERKSt9_Any_data
 at ??:0    @     0x7ff977e2bcf4 std::function<>::operator()() at ??:0    @     
0x7ff9759f8913 kudu::Thread::SuperviseThread() at ??:0    @     0x7ff97683b6db 
start_thread at ??:0    @     0x7ff97388e71f clone at ??:0 {code}

Attached the full logs -- seems there's something unsafe about how we 
unregister ops (maybe from the fix for KUDU-3149?) when racing with the 
scheduler thread.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3154) RangerClientTestBase.TestLogging sometimes fails

2021-03-20 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3154.
---
Fix Version/s: 1.13.0
   Resolution: Fixed

The merged patch fixed it in our environment. Separately, we've seen that 
exceptions in the Ranger client are what typically causes these kinds of hangs.

> RangerClientTestBase.TestLogging sometimes fails
> 
>
> Key: KUDU-3154
> URL: https://issues.apache.org/jira/browse/KUDU-3154
> Project: Kudu
>  Issue Type: Bug
>  Components: ranger, test
>Affects Versions: 1.13.0
>Reporter: Alexey Serbin
>Priority: Major
> Fix For: 1.13.0
>
> Attachments: kudu-3154_jstacks.txt, ranger_client-test.txt, 
> ranger_client-test.txt.xz
>
>
> The {{RangerClientTestBase.TestLogging}} scenario of the 
> {{ranger_client-test}} sometimes fails (all types of builds) with error 
> message like below:
> {noformat}
> src/kudu/ranger/ranger_client-test.cc:398: Failure
> Failed
>   
> Bad status: Timed out: timed out while in flight  
>   
> I0620 07:06:02.907177  1140 server.cc:247] Received an EOF from the 
> subprocess  
> I0620 07:06:02.910923  1137 server.cc:317] get failed, inbound queue shut 
> down: Aborted:
> I0620 07:06:02.910964  1141 server.cc:380] outbound queue shut down: Aborted: 
>   
> I0620 07:06:02.910995  1138 server.cc:317] get failed, inbound queue shut 
> down: Aborted:
> I0620 07:06:02.910984  1139 server.cc:317] get failed, inbound queue shut 
> down: Aborted:
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3258) Expose some kind of transaction dashboard in ksck or the web UI

2021-03-16 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3258:
--
Description: 
It would be useful to expose the locations and tablet IDs of the 
TxnStatusManager replicas, and even show the health of them from unified front, 
whether that's the web UI, ksck, or both. Some useful things to know about:
 - The tablet ID, range, and location of each TxnStatusManager partition
 - The highest transaction ID per TxnStatusManager partition
 - In-flight (not COMMITTED or ABORTED) transactions and their current state, 
though would also be nice to filter specific states
 - Commit timestamp (and other relevant timestamps, if available, reported with 
physical and logical portions)
 - We could also consider storing the transaction creation time in the same way 
that we have a "time created" for tables in the masters

After some discussion with Alexey, we think it'd be more useful to focus on:
 * having a separate section in ksck to display the health of the transaction 
status table
 * having a separate tool to focus on displaying the business logic of the 
TxnStatusManager partitions (not the web UI, for now)

  was:
It would be useful to expose the locations and tablet IDs of the 
TxnStatusManager replicas, and even show the health of them from unified front, 
whether that's the web UI, ksck, or both. Some useful things to know about:
 - The tablet ID, range, and location of each TxnStatusManager partition
 - The highest transaction ID per TxnStatusManager partition
 - In-flight (not COMMITTED or ABORTED) transactions and their current state


> Expose some kind of transaction dashboard in ksck or the web UI
> ---
>
> Key: KUDU-3258
> URL: https://issues.apache.org/jira/browse/KUDU-3258
> Project: Kudu
>  Issue Type: Improvement
>  Components: ops-tooling, transactions
>Reporter: Andrew Wong
>Priority: Major
>
> It would be useful to expose the locations and tablet IDs of the 
> TxnStatusManager replicas, and even show the health of them from unified 
> front, whether that's the web UI, ksck, or both. Some useful things to know 
> about:
>  - The tablet ID, range, and location of each TxnStatusManager partition
>  - The highest transaction ID per TxnStatusManager partition
>  - In-flight (not COMMITTED or ABORTED) transactions and their current state, 
> though would also be nice to filter specific states
>  - Commit timestamp (and other relevant timestamps, if available, reported 
> with physical and logical portions)
>  - We could also consider storing the transaction creation time in the same 
> way that we have a "time created" for tables in the masters
> After some discussion with Alexey, we think it'd be more useful to focus on:
>  * having a separate section in ksck to display the health of the transaction 
> status table
>  * having a separate tool to focus on displaying the business logic of the 
> TxnStatusManager partitions (not the web UI, for now)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3226) Validate List of Masters for kudu ksck

2021-03-12 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3226.
---
Fix Version/s: 1.15.0
   Resolution: Fixed

> Validate List of Masters for kudu ksck
> --
>
> Key: KUDU-3226
> URL: https://issues.apache.org/jira/browse/KUDU-3226
> Project: Kudu
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Abhishek
>Priority: Minor
>  Labels: beginner, newbie, trivial
> Fix For: 1.15.0
>
>
> I recently accidentally included a list of masters where I fat-fingered the 
> host names and included the same node twice.
> I got some stack trace and an error message about duplicate keys inserted 
> into a map.  I eventually figured out what I did wrong, but the error 
> condition was not super helpful.
> Please add an early validation step that explicitly captures this conditions 
> and provides a helpful error message that includes the name of the host name 
> which was duplicated on the command line.
> This happened for me with {{kudu ksck}} but there may be other candidates as 
> well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3261) Support updates and deletes in transactions

2021-03-11 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3261:
--
Description: 
Kudu currently only supports multi-row, multi-partition transactions for INSERT 
and INSERT_IGNORE operations. We should consider extending the Kudu tablet 
store to support:
 - Tracking Mutations in the MRS that are associated with a transaction
 - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per 
rowset per transaction. These delta trackers should be merged with the main 
delta trackers of each DRS. I'm not sure if it will be helpful, but I have [a 
patch|https://gerrit.cloudera.org/c/16387/] to encapsulate some of the delta 
applying logic – I suspect it might be useful in defining a delta iterator that 
spits out delta keys with a singular timestamp, as well as for defining a 
"mergeable" delta input (note we have a merge now, but it does a simple 
sequential merge of delta stores with the assumption that the input iterators 
are disjoint by timestamp, which may not be the case if we have transactional 
delta trackers that overlap in time with the main delta tracker).

The DeltaReaders for the DRSs should consider the transaction's finalized 
commit timestamp (or lack thereof) in the same way that the MRS iterator 
considers mutations in the context of a snapshot.

  was:
Kudu currently only supports multi-row, multi-partition transactions for INSERT 
and INSERT_IGNORE operations. We should consider extending the Kudu tablet 
store to support:
 - Tracking Mutations in the MRS that are associated with a transaction
 - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per 
rowset per transaction. These delta trackers should be merged with the main 
delta trackers of each DRS. I'm not sure if it will be helpful, but I have a 
patch to encapsulate some of the delta applying logic – I suspect it might be 
useful in defining a delta iterator that spits out delta keys with a singular 
timestamp, as well as for defining a "mergeable" delta input (note we have a 
merge now, but it does a simple sequential merge of delta stores with the 
assumption that the input iterators are disjoint by timestamp, which may not be 
the case if we have transactional delta trackers that overlap in time with the 
main delta tracker).

The DeltaReaders for the DRSs should consider the transaction's finalized 
commit timestamp (or lack thereof) in the same way that the MRS iterator 
considers mutations in the context of a snapshot.


> Support updates and deletes in transactions
> ---
>
> Key: KUDU-3261
> URL: https://issues.apache.org/jira/browse/KUDU-3261
> Project: Kudu
>  Issue Type: Improvement
>  Components: tablet, transactions
>Reporter: Andrew Wong
>Priority: Major
>
> Kudu currently only supports multi-row, multi-partition transactions for 
> INSERT and INSERT_IGNORE operations. We should consider extending the Kudu 
> tablet store to support:
>  - Tracking Mutations in the MRS that are associated with a transaction
>  - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per 
> rowset per transaction. These delta trackers should be merged with the main 
> delta trackers of each DRS. I'm not sure if it will be helpful, but I have [a 
> patch|https://gerrit.cloudera.org/c/16387/] to encapsulate some of the delta 
> applying logic – I suspect it might be useful in defining a delta iterator 
> that spits out delta keys with a singular timestamp, as well as for defining 
> a "mergeable" delta input (note we have a merge now, but it does a simple 
> sequential merge of delta stores with the assumption that the input iterators 
> are disjoint by timestamp, which may not be the case if we have transactional 
> delta trackers that overlap in time with the main delta tracker).
> The DeltaReaders for the DRSs should consider the transaction's finalized 
> commit timestamp (or lack thereof) in the same way that the MRS iterator 
> considers mutations in the context of a snapshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3261) Support updates and deletes in transactions

2021-03-11 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3261:
--
Description: 
Kudu currently only supports multi-row, multi-partition transactions for INSERT 
and INSERT_IGNORE operations. We should consider extending the Kudu tablet 
store to support:
 - Tracking Mutations in the MRS that are associated with a transaction
 - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per 
rowset per transaction. These delta trackers should be merged with the main 
delta trackers of each DRS. I'm not sure if it will be helpful, but I have a 
patch to encapsulate some of the delta applying logic – I suspect it might be 
useful in defining a delta iterator that spits out delta keys with a singular 
timestamp, as well as for defining a "mergeable" delta input (note we have a 
merge now, but it does a simple sequential merge of delta stores with the 
assumption that the input iterators are disjoint by timestamp, which may not be 
the case if we have transactional delta trackers that overlap in time with the 
main delta tracker).

The DeltaReaders for the DRSs should consider the transaction's finalized 
commit timestamp (or lack thereof) in the same way that the MRS iterator 
considers mutations in the context of a snapshot.

  was:
Kudu currently only supports multi-row, multi-partition transactions for INSERT 
and INSERT_IGNORE operations. We should consider extending the Kudu tablet 
store to support:
 - Tracking Mutations in the MRS that are associated with a transaction
 - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per 
rowset per transaction

The DeltaReaders for the DRSs should consider the transaction's finalized 
commit timestamp (or lack thereof) in the same way that the MRS iterator 
considers mutations in the context of a snapshot.


> Support updates and deletes in transactions
> ---
>
> Key: KUDU-3261
> URL: https://issues.apache.org/jira/browse/KUDU-3261
> Project: Kudu
>  Issue Type: Improvement
>  Components: tablet, transactions
>Reporter: Andrew Wong
>Priority: Major
>
> Kudu currently only supports multi-row, multi-partition transactions for 
> INSERT and INSERT_IGNORE operations. We should consider extending the Kudu 
> tablet store to support:
>  - Tracking Mutations in the MRS that are associated with a transaction
>  - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per 
> rowset per transaction. These delta trackers should be merged with the main 
> delta trackers of each DRS. I'm not sure if it will be helpful, but I have a 
> patch to encapsulate some of the delta applying logic – I suspect it might be 
> useful in defining a delta iterator that spits out delta keys with a singular 
> timestamp, as well as for defining a "mergeable" delta input (note we have a 
> merge now, but it does a simple sequential merge of delta stores with the 
> assumption that the input iterators are disjoint by timestamp, which may not 
> be the case if we have transactional delta trackers that overlap in time with 
> the main delta tracker).
> The DeltaReaders for the DRSs should consider the transaction's finalized 
> commit timestamp (or lack thereof) in the same way that the MRS iterator 
> considers mutations in the context of a snapshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3262) Define more forward-looking feature flags for transactions

2021-03-11 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3262:
-

 Summary: Define more forward-looking feature flags for transactions
 Key: KUDU-3262
 URL: https://issues.apache.org/jira/browse/KUDU-3262
 Project: Kudu
  Issue Type: Improvement
  Components: supportability, transactions
Reporter: Andrew Wong


There are several extensions to transactions that seem reasonable for Kudu's 
roadmap. It's thus worth defining some kind of TransactionOpts, e.g. allowing 
users to define locking strategy and deadlock control, allowing us to more 
easily develop new functionality without overhauling the existing transactions 
API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3261) Support updates and deletes in transactions

2021-03-11 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3261:
-

 Summary: Support updates and deletes in transactions
 Key: KUDU-3261
 URL: https://issues.apache.org/jira/browse/KUDU-3261
 Project: Kudu
  Issue Type: Improvement
  Components: tablet, transactions
Reporter: Andrew Wong


Kudu currently only supports multi-row, multi-partition transactions for INSERT 
and INSERT_IGNORE operations. We should consider extending the Kudu tablet 
store to support:
 - Tracking Mutations in the MRS that are associated with a transaction
 - Maintaining a separate DeltaTracker (i.e. DMS and multiple DeltaFiles) per 
rowset per transaction

The DeltaReaders for the DRSs should consider the transaction's finalized 
commit timestamp (or lack thereof) in the same way that the MRS iterator 
considers mutations in the context of a snapshot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3260) Reduce the on-disk footprint of transaction metadata in participants

2021-03-11 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3260:
-

 Summary: Reduce the on-disk footprint of transaction metadata in 
participants
 Key: KUDU-3260
 URL: https://issues.apache.org/jira/browse/KUDU-3260
 Project: Kudu
  Issue Type: Improvement
  Components: server, transactions
Reporter: Andrew Wong


We should remove the commit timestamp and txn metadata once we've flushed all 
in-memory stores that rely on metadata for determining commit timestamp. If we 
see tablets serving more frequent transactions, the persisted metadata may 
become less negligible.

One thing to watch out for here is that we currently use metadata to determine 
whether a transaction has _ever_ existed on the participant – if we simply get 
rid of the metadata, we will lose out on this knowledge. More thought should be 
given to how and when to do this safely, and ensure that we clean up the 
metadata in such a way that no invariants are broken with regards to knowing 
about transaction existence (e.g. perhaps only clean up the txn metadata if the 
corresponding TxnStatusManager has been deleted?).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3259) Define ownership of transactions for participants to prevent malicious users from writing to a transaction

2021-03-11 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3259:
-

 Summary: Define ownership of transactions for participants to 
prevent malicious users from writing to a transaction
 Key: KUDU-3259
 URL: https://issues.apache.org/jira/browse/KUDU-3259
 Project: Kudu
  Issue Type: Improvement
  Components: security, transactions
Reporter: Andrew Wong


Currently, any user can write as a part of a transaction. This isn't 
necessarily safe, though at the very least, Kudu still performs its authz 
checks on every write request that enters the system. When a participant calls 
BEGIN_TXN, we should consider also persisting the username of the writer, which 
should also get validated on the call to RegisterParticipant. Once successful, 
further write requests can be rejected if they are from other users.

Note that calls to the TxnStatusManager are protected in this way (e.g. calls 
to commit or rollback will validate that the caller matches the 'user' field in 
the {{TxnStatusEntryPB}}.

One thing to be cognizant of here is that if we are going to persist more 
metadata per transaction, we should strongly consider ways to reduce the amount 
of metadata stored in a single SuperBlockPB file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3258) Expose some kind of transaction dashboard in ksck or the web UI

2021-03-11 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3258:
-

 Summary: Expose some kind of transaction dashboard in ksck or the 
web UI
 Key: KUDU-3258
 URL: https://issues.apache.org/jira/browse/KUDU-3258
 Project: Kudu
  Issue Type: Improvement
  Components: ops-tooling, transactions
Reporter: Andrew Wong


It would be useful to expose the locations and tablet IDs of the 
TxnStatusManager replicas, and even show the health of them from unified front, 
whether that's the web UI, ksck, or both. Some useful things to know about:
 - The tablet ID, range, and location of each TxnStatusManager partition
 - The highest transaction ID per TxnStatusManager partition
 - In-flight (not COMMITTED or ABORTED) transactions and their current state



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3257) Add tooling for operating on transactions

2021-03-11 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3257:
-

 Summary: Add tooling for operating on transactions
 Key: KUDU-3257
 URL: https://issues.apache.org/jira/browse/KUDU-3257
 Project: Kudu
  Issue Type: Improvement
  Components: ops-tooling, transactions
Reporter: Andrew Wong


We should expose transactions to operators who want to observe or even 
interfere with transactions (in case something has already gone wrong). A 
simple tool to wrap the TxnSystemClient seems like a great place to start, 
exposing commands like:

Wrappers for TxnParticipant calls:
- kudu remote_replica begin_txn 
- kudu remote_replica begin_commit 
- kudu remote_replica finalize_commit  (should be used sparingly!)
- kudu remote_replica abort_txn 

Wrappers for the TxnStatusManager calls:
- kudu txns list
- kudu txns show 
- kudu txns start_txn
- kudu txns commit 
- kudu txns rollback 
- kudu txns keep_alive 

Wrappers for operating on the transaction status table:
- kudu txns create_txn_status_table
- kudu txns add_txn_status_table range
- kudu txns drop_txn_status_table range



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3256) Limit the memory usage per transaction per tablet

2021-03-11 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3256:
-

 Summary: Limit the memory usage per transaction per tablet
 Key: KUDU-3256
 URL: https://issues.apache.org/jira/browse/KUDU-3256
 Project: Kudu
  Issue Type: Improvement
  Components: transactions
Reporter: Andrew Wong


Currently, the transactions implementation stores all new inserts in a new MRS 
per transaction per tablet. As transactions get larger and larger, or as there 
are more transactions entering the system, this will result in memory pressure 
across tablet servers. We should explore ways to limit the memory usage per 
transaction, by either enforcing a memory limit per transaction participant, or 
by flushing transactional MRSs before committing, per regular maintenance op 
cadence (e.g. based on memory pressure, MRS size, time since last flush, etc.).

While it'd be significantly more complex, I'm more partial to the latter 
approach – the mechanics to flush an MRS already exist, so why not use them? It 
should be noted though that we would then need to update how bootstrapping is 
handled by persisting a 'last_flushed_mrs_id' per transaction, similar to 
what's done today for non-transactional MRSs. Additionally, the existing code 
to swap in new disk rowsets atomically would need some thought to ensure 
swapping in transactional rowsets while racing with a commit does the right 
thing (i.e. if we flush the transactional MRS while committing, the end result 
is the new DRSs should end up in the main rowset tree).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3213) Java client should attempt a different tablet server when retrying during tserver quiescing

2021-03-09 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3213.
---
Fix Version/s: 1.15.0
   Resolution: Fixed

> Java client should attempt a different tablet server when retrying during 
> tserver quiescing
> ---
>
> Key: KUDU-3213
> URL: https://issues.apache.org/jira/browse/KUDU-3213
> Project: Kudu
>  Issue Type: Bug
>  Components: java
>Reporter: Andrew Wong
>Priority: Major
> Fix For: 1.15.0
>
>
> One of our clusters ran into the following error message when leaving a 
> tablet server quiesced for an extended period of time:
> {code:java}
> ERROR Runner: Pipeline exception occurred: org.apache.spark.SparkException: 
> Job aborted due to stage failure: Task 1 in stage 6.0 failed 4 times, most 
> recent failure: Lost task 1.3 in stage 6.0 (TID 1922, 
> tserver-018.edh.company.com, executor 58): 
> org.apache.kudu.client.NonRecoverableException: cannot complete before 
> timeout: ScanRequest(scannerId=null, tablet=9e17b554f85f4a7f855771d8b5c913f5, 
> attempt=24, KuduRpc(method=Scan, tablet=9e17b554f85f4a7f855771d8b5c913f5, 
> attempt=24, DeadlineTracker(timeout=3, elapsed=27988), Traces: [0ms] 
> refreshing cache from master, [1ms] Sub RPC GetTableLocations: sending RPC to 
> server master-name-003.edh.company.com:7051, [12ms] Sub RPC 
> GetTableLocations: received response from server 
> master-name-003.edh.company.com:7051: OK, [22ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [116ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [117ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [126ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [129ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [129ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [146ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [149ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [149ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [166ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [168ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [168ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [206ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [209ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [209ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [266ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [268ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [268ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [306ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [308ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [308ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [545ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [548ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [548ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [865ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [868ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [868ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [1266ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [1269ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [1269ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [2626ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [2628ms] delaying RPC due to: Service 
> unavailable: Tablet server is quiescing (error 0), [2628ms] received response 
> from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
> server is quiescing (error 0), [4746ms] sending RPC to server 
> e1a4405443d845249b5ed15c8e882211, [4749

[jira] [Commented] (KUDU-3252) FsManager initialized contention if two tserver instances try to call CreateInitialFileSystemLayout on the same folder

2021-03-01 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293375#comment-17293375
 ] 

Andrew Wong commented on KUDU-3252:
---

If the issue is the absence of file locking, an implementation of WAL instance 
files (as might be a part of KUDU-2975) that leverages 
{{DirInstanceMetadataFiles}} (see 
[https://github.com/apache/kudu/blob/master/src/kudu/fs/dir_util.h#L74)] could 
help prevent this.

> FsManager initialized contention if two tserver instances try to call 
> CreateInitialFileSystemLayout on the same folder 
> ---
>
> Key: KUDU-3252
> URL: https://issues.apache.org/jira/browse/KUDU-3252
> Project: Kudu
>  Issue Type: Bug
>Reporter: Redriver
>Priority: Critical
>
> {color:#1d1c1d}I scanned the Kudu source code for DeleteDir invocation, there 
> are 2 places: 
> {color}[https://github.com/apache/kudu/blob/master/src/kudu/fs/fs_manager.cc#L384]{color:#1d1c1d}
>  and 
> {color}[https://github.com/apache/kudu/blob/master/src/kudu/fs/fs_manager.cc#L485]{color:#1d1c1d}.{color}
> {color:#1d1c1d}Imagine, I start kudu-tserver twice by mistake, finally only 
> one kudu-tserver started, is it possible that its folder may be removed by 
> another kudu-tserver since it failed to start.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3109) Log administrative operations

2021-01-06 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17260074#comment-17260074
 ] 

Andrew Wong commented on KUDU-3109:
---

In addition to this, it'd be helpful to know about any kind of natural 
re-replication that has happened. I was recently looking at a case in which it 
seemed like some bad blocks had replicated (in an older version that didn't 
check checksums) to other servers. Knowing about which tablets were copied from 
which servers would have been helpful. This could have been pieced together via 
the glog output, but the logs didn't go back far enough.

> Log administrative operations
> -
>
> Key: KUDU-3109
> URL: https://issues.apache.org/jira/browse/KUDU-3109
> Project: Kudu
>  Issue Type: Task
>  Components: security
>Reporter: Attila Bukor
>Priority: Minor
>
> Sometimes it's impossible to determine what caused an issue when 
> administrators run unsafe commands on the cluster. Logging these in an audit 
> log would help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3109) Log administrative operations

2021-01-06 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3109:
--
Issue Type: New Feature  (was: Task)
  Priority: Major  (was: Minor)

> Log administrative operations
> -
>
> Key: KUDU-3109
> URL: https://issues.apache.org/jira/browse/KUDU-3109
> Project: Kudu
>  Issue Type: New Feature
>  Components: security
>Reporter: Attila Bukor
>Priority: Major
>
> Sometimes it's impossible to determine what caused an issue when 
> administrators run unsafe commands on the cluster. Logging these in an audit 
> log would help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3229) Tooling to examine and operate on log blocks

2021-01-06 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3229:
-

 Summary: Tooling to examine and operate on log blocks
 Key: KUDU-3229
 URL: https://issues.apache.org/jira/browse/KUDU-3229
 Project: Kudu
  Issue Type: Improvement
  Components: fs, ops-tooling
Reporter: Andrew Wong


It's somewhat troublesome to examine the contents of a log block container 
today. Tooling exists in the form of {{kudu pbc dump}} for metadata and 
{{hexdump}} for data, but it'd be nice to have more specialized tooling for 
examining containers to understand things like:
 * What blocks are in this container? When was each block last updated? You can 
piece this together from the {{kudu pbc dump}} on the metadata, but having 
something more tabular might be nice.
 * Does each block actually contain any data? If not, which don't?
 * Does each block have a valid header if it were a CFile block?

Some of the information I'd like to get at falls out of the purview of the log 
block manager itself, and requires information like what kind of blocks we're 
dealing with. But the underlying struggle I'd like to address is: given a 
container, can we be more rigorous about our checks that the data is OK, and 
flag blocks that appear broken? 

The context of this was a (Kudu version 1.5.x) case in which some form of 
corruption occurred, and we were left with containers that appeared to have 
holes punched out of them, resulting in messages complaining about bad CFile 
header magic values of "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" (vs 
the expected "kuducfl2"). The log block metadata and tablet metadata both had 
records of many blocks, but the corresponding locations in the data files were 
all zeroes. It's unclear how this happened, but even just examining the 
containers and blocks therein was not well-documented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3228) Background process that checks for corrupted block and failed disks

2021-01-06 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3228:
-

 Summary: Background process that checks for corrupted block and 
failed disks
 Key: KUDU-3228
 URL: https://issues.apache.org/jira/browse/KUDU-3228
 Project: Kudu
  Issue Type: Improvement
  Components: cfile, fs, tserver
Reporter: Andrew Wong


Currently, CFile corruption and failed disks will result in any bad tablets 
being marked as failed, being re-replicated elsewhere, and any scans that were 
in progress for them being retried at other servers.

Rather than waiting for the first bad access to do this, we may want to 
implement a background task that checks for corruption and proactively 
re-replicates such tablets. That way, especially when there are long periods of 
client inactivity, we can the faulty-hardware-related re-replication out of the 
way.

The task should probably only run when the tserver isn't serving many scans or 
writes. It should also avoid polluting the block cache, if attempting to check 
for CFile corruption.

HDFS has a "disk checker" task that may be worth drawing inspiration from.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3227) Improve client error message when not a part of a trusted subnet

2021-01-06 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3227:
-

 Summary: Improve client error message when not a part of a trusted 
subnet
 Key: KUDU-3227
 URL: https://issues.apache.org/jira/browse/KUDU-3227
 Project: Kudu
  Issue Type: Improvement
  Components: client
Reporter: Andrew Wong


I recently saw a case where the Java application spit out this error, failing 
to connect to the cluster:
{code:java}
Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config 
(master:7051) has no leader. Exceptions received: 
org.apache.kudu.client.RecoverableException: connection disconnected
at 
org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:289)
at org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:49)
at 
org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:365)
at 
org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:354)
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1315)
at com.stumbleupon.async.Deferred.doCall(Deferred.java:1286)
{code}
Other clients (i.e. Impala) were able to run similar queries without such 
errors, so it seemed localized to this one application. This was odd given the 
error message complains about not having a Master leader, a property of the 
cluster, not the client.

Inspecting the master logs, it was relatively clear that {{--trusted_subnets}} 
was likely to blame (the server-side warning message that is spit out mentions 
it by name). It would be nice if this was obvious in the clients as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3227) Improve client error message when not a part of a trusted subnet

2021-01-06 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3227:
--
Labels: newbie  (was: )

> Improve client error message when not a part of a trusted subnet
> 
>
> Key: KUDU-3227
> URL: https://issues.apache.org/jira/browse/KUDU-3227
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: Andrew Wong
>Priority: Major
>  Labels: newbie
>
> I recently saw a case where the Java application spit out this error, failing 
> to connect to the cluster:
> {code:java}
> Caused by: org.apache.kudu.client.NoLeaderFoundException: Master config 
> (master:7051) has no leader. Exceptions received: 
> org.apache.kudu.client.RecoverableException: connection disconnected
> at 
> org.apache.kudu.client.ConnectToCluster.incrementCountAndCheckExhausted(ConnectToCluster.java:289)
> at 
> org.apache.kudu.client.ConnectToCluster.access$100(ConnectToCluster.java:49)
> at 
> org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:365)
> at 
> org.apache.kudu.client.ConnectToCluster$ConnectToMasterErrCB.call(ConnectToCluster.java:354)
> at com.stumbleupon.async.Deferred.doCall(Deferred.java:1280)
> at com.stumbleupon.async.Deferred.runCallbacks(Deferred.java:1259)
> at com.stumbleupon.async.Deferred.handleContinuation(Deferred.java:1315)
> at com.stumbleupon.async.Deferred.doCall(Deferred.java:1286)
> {code}
> Other clients (i.e. Impala) were able to run similar queries without such 
> errors, so it seemed localized to this one application. This was odd given 
> the error message complains about not having a Master leader, a property of 
> the cluster, not the client.
> Inspecting the master logs, it was relatively clear that 
> {{--trusted_subnets}} was likely to blame (the server-side warning message 
> that is spit out mentions it by name). It would be nice if this was obvious 
> in the clients as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3222) std::bad_alloc in full_stack-insert-scan-test.cc

2020-12-14 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17249400#comment-17249400
 ] 

Andrew Wong commented on KUDU-3222:
---

FWIW we did have a run succeed on commit 
420b07e6490e14f26107088bc1b09866b6d43bba. If this was caused by a code change, 
it is likely the gperftools bump 713fee390d0241bf466f490d5b2c678f7ebe5175, 
since the glog change was reverted.

> std::bad_alloc in full_stack-insert-scan-test.cc
> 
>
> Key: KUDU-3222
> URL: https://issues.apache.org/jira/browse/KUDU-3222
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: Andrew Wong
>Priority: Major
> Attachments: FullStackScanInsertMRSOnly3.log
>
>
> Recently we've been starting to see the following in our runs of 
> full_stack-insert-scan-test:
> {code:java}
> I1214 13:30:32.995853 39072 full_stack-insert-scan-test.cc:271] Insertion 
> thread 7 of 50 is 69% done.
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  std::bad_alloc
> *** Aborted at 1607981433 (unix time) try "date -d @1607981433" if you are 
> using GNU date ***
> PC: @   0x3f85032625 __GI_raise
> *** SIGABRT (@0x11569802) received by PID 38914 (TID 0x7f81b4a02700) from 
> PID 38914; stack trace: ***
> @   0xcf8a21 google::(anonymous namespace)::FailureSignalHandler()
> @   0x3f8540f710 (unknown)
> @   0x3f85032625 __GI_raise
> @   0x3f85033e05 __GI_abort
> @   0x3f884bea7d __gnu_cxx::__verbose_terminate_handler()
> @   0x3f884bcbd6 (unknown)
> @   0x3f884bcc03 std::terminate()
> @   0x3f884bcd22 __cxa_throw
> @   0xd14bd5 (anonymous namespace)::handle_oom()
> @  0x2ff3872 tcmalloc::allocate_full_cpp_throw_oom()
> @  0x2cd4c1a 
> _ZNSt6vectorIN4kudu19DecodedRowOperationESaIS1_EE17_M_realloc_insertIJRKS1_EEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_
> @  0x2cd535a kudu::RowOperationsPBDecoder::DecodeOperations<>()
> @  0x131c2a6 kudu::tablet::Tablet::DecodeWriteOperations()
> I1214 13:30:33.075912 39094 full_stack-insert-scan-test.cc:271] Insertion 
> thread 29 of 50 is 69% done.
> @  0x135bcb6 kudu::tablet::WriteOp::Prepare()
> @  0x13514ac kudu::tablet::OpDriver::Prepare()
> @  0x13520ed 
> _ZNSt17_Function_handlerIFvvEZN4kudu6tablet8OpDriver12ExecuteAsyncEvEUlvE_E9_M_invokeERKSt9_Any_data
> @  0x2e2409e kudu::ThreadPool::DispatchThread()
> @  0x2e1b2c5 kudu::Thread::SuperviseThread()
> @   0x3f854079d1 start_thread
> @   0x3f850e88fd clone {code}
> This runs as a part of a suite of several tests in {{scripts/benchmarks.sh}}.
> This started happening fairly consistently starting around commit 
> 2943aa701ee092158c2084c614a91f92513ef7c4, when we bumped glog and gperftools, 
> though I'm not sure they are directly related here.
> The attached logs are a run on CentOS 6.6, with around 100GB of memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3222) std::bad_alloc in full_stack-insert-scan-test.cc

2020-12-14 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3222:
-

 Summary: std::bad_alloc in full_stack-insert-scan-test.cc
 Key: KUDU-3222
 URL: https://issues.apache.org/jira/browse/KUDU-3222
 Project: Kudu
  Issue Type: Bug
  Components: test
Reporter: Andrew Wong
 Attachments: FullStackScanInsertMRSOnly3.log

Recently we've been starting to see the following in our runs of 
full_stack-insert-scan-test:
{code:java}
I1214 13:30:32.995853 39072 full_stack-insert-scan-test.cc:271] Insertion 
thread 7 of 50 is 69% done.
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
*** Aborted at 1607981433 (unix time) try "date -d @1607981433" if you are 
using GNU date ***
PC: @   0x3f85032625 __GI_raise
*** SIGABRT (@0x11569802) received by PID 38914 (TID 0x7f81b4a02700) from 
PID 38914; stack trace: ***
@   0xcf8a21 google::(anonymous namespace)::FailureSignalHandler()
@   0x3f8540f710 (unknown)
@   0x3f85032625 __GI_raise
@   0x3f85033e05 __GI_abort
@   0x3f884bea7d __gnu_cxx::__verbose_terminate_handler()
@   0x3f884bcbd6 (unknown)
@   0x3f884bcc03 std::terminate()
@   0x3f884bcd22 __cxa_throw
@   0xd14bd5 (anonymous namespace)::handle_oom()
@  0x2ff3872 tcmalloc::allocate_full_cpp_throw_oom()
@  0x2cd4c1a 
_ZNSt6vectorIN4kudu19DecodedRowOperationESaIS1_EE17_M_realloc_insertIJRKS1_EEEvN9__gnu_cxx17__normal_iteratorIPS1_S3_EEDpOT_
@  0x2cd535a kudu::RowOperationsPBDecoder::DecodeOperations<>()
@  0x131c2a6 kudu::tablet::Tablet::DecodeWriteOperations()
I1214 13:30:33.075912 39094 full_stack-insert-scan-test.cc:271] Insertion 
thread 29 of 50 is 69% done.
@  0x135bcb6 kudu::tablet::WriteOp::Prepare()
@  0x13514ac kudu::tablet::OpDriver::Prepare()
@  0x13520ed 
_ZNSt17_Function_handlerIFvvEZN4kudu6tablet8OpDriver12ExecuteAsyncEvEUlvE_E9_M_invokeERKSt9_Any_data
@  0x2e2409e kudu::ThreadPool::DispatchThread()
@  0x2e1b2c5 kudu::Thread::SuperviseThread()
@   0x3f854079d1 start_thread
@   0x3f850e88fd clone {code}

This runs as a part of a suite of several tests in {{scripts/benchmarks.sh}}.
This started happening fairly consistently starting around commit 
2943aa701ee092158c2084c614a91f92513ef7c4, when we bumped glog and gperftools, 
though I'm not sure they are directly related here.

The attached logs are a run on CentOS 6.6, with around 100GB of memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2726) Very large tablets defeat budgeted compaction

2020-12-12 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17248493#comment-17248493
 ] 

Andrew Wong commented on KUDU-2726:
---

I'm a bit hesitant to entirely move consideration of the maintenance 
adjustments into stage 1 – it seems like these are used for prioritizing the 
ops that would have been done, rather than defining whether or not an op is 
worth performing. With that distinction, we should try to introduce a solution 
that tackles the latter without affecting the former.

That said, I wouldn't be against introducing further improvements to stage 1.

Introducing some manually-defined value similar to {{maintenance_priority}} and 
{{maintenance_op_multiplier}} sound like an OK solution in that some users may 
already be familiar the existing multipliers. I'm not personally a fan of it 
because picking correct values for these seems configurations unintuitive, but 
I know there are Kudu users who do find this configuration effective.

 

Another solution would be to have stage 1 also account for the size of a 
tablet: if a tablet is very large, increase the compaction performance score. 
An observation here is that compacting 128MiB worth of data in a single 50GiB 
tablet may result in a compaction perf score of below 0.01, despite the average 
rowset height being relatively high. If instead we imagined the tablet were 
actually two 25GiB tablets, a 128MiB compaction may result in a higher perf 
score.

Based on this observation, rather than running the budgeted compaction policy 
against the entire tablet, we could run it on multiple subsets of the tablet. 
For instance, if we have a 50GiB tablet, define some window W=25GiB such that 
before running the compaction scoring/selection, if the tablet is over size W, 
we split the input rowsets into 50/W = 2 separate sets of rowsets, run the 
compaction scoring/selection algorithm on both of these sets, and pick the best 
perf scores among the sets. This would mean {{compaction_minimum_improvement}} 
would no longer apply to the entire tablet, but rather it would apply to 
W-sized chunks of the tablet.

If going down the route I'm describing, there needs to be more thought given to 
ensuring this doesn't introduce some never-ending compaction loop, but I think 
the solution is a somewhat elegant workaround for the fact that Kudu doesn't 
support tablet splits today.

> Very large tablets defeat budgeted compaction
> -
>
> Key: KUDU-2726
> URL: https://issues.apache.org/jira/browse/KUDU-2726
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: William Berkeley
>Priority: Major
>  Labels: density, roadmap-candidate
>
> On very large tablets (50GB+), despite being very uncompacted with a large 
> average rowset height, a default budget (128MB) worth of compaction may not 
> reduce average rowset height enough to pass the minimum threshold. Thus the 
> tablet stays uncompacted forever.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-613) Scan-resistant cache replacement algorithm for the block cache

2020-12-10 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247540#comment-17247540
 ] 

Andrew Wong commented on KUDU-613:
--

The Apache Impala project has pulled in Kudu's block cache implementation and 
[extended it with LIRS|https://gerrit.cloudera.org/c/15306/]. It's probably 
worth pulling those bits in and seeing how they fare against contentious 
large-scan workloads in Kudu.

LIRS: 
[http://web.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-02-6.pdf]

 

> Scan-resistant cache replacement algorithm for the block cache
> --
>
> Key: KUDU-613
> URL: https://issues.apache.org/jira/browse/KUDU-613
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf
>Affects Versions: M4.5
>Reporter: Andrew Wang
>Priority: Major
>  Labels: roadmap-candidate
>
> The block cache currently uses LRU, which is vulnerable to large scan 
> workloads. It'd be good to implement something like 2Q.
> ARC (patent encumbered, but good for ideas):
> https://www.usenix.org/conference/fast-03/arc-self-tuning-low-overhead-replacement-cache
> HBase (2Q like):
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3220) Pre-commit appears to drop files sometimes

2020-12-08 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3220:
-

 Summary: Pre-commit appears to drop files sometimes
 Key: KUDU-3220
 URL: https://issues.apache.org/jira/browse/KUDU-3220
 Project: Kudu
  Issue Type: Bug
  Components: project-infra
Reporter: Andrew Wong
 Attachments: consoleText.txt

I had a DEBUG precommit job fail because some built artifacts can't be found, 
even though they were just built.
{code:java}
[ 35%] Building CXX object 
src/kudu/rpc/CMakeFiles/krpc_exported.dir/negotiation.cc.o
[ 35%] Building CXX object 
src/kudu/security/CMakeFiles/security_test_util.dir/test/test_certs.cc.o
[ 35%] Building CXX object src/kudu/fs/CMakeFiles/kudu_fs.dir/fs_report.cc.o
...
c++: error: CMakeFiles/kudu_fs.dir/dir_util.cc.o: No such file or directory
c++: error: CMakeFiles/kudu_fs.dir/error_manager.cc.o: No such file or directory
c++: error: CMakeFiles/kudu_fs.dir/file_block_manager.cc.o: No such file or 
directory
c++: error: CMakeFiles/kudu_fs.dir/fs_manager.cc.o: No such file or directory
c++: error: CMakeFiles/kudu_fs.dir/fs_report.cc.o: No such file or directory 
{code}

There was another DEBUG build running concurrently, but it appeared to land in 
a different workspace. I've retriggered the job and I don't expect it's related 
to my patch, but I'm opening this ticket in case others see similar issues on 
the pre-commit infra.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3108) Tablet server crashes when handle diffscan request

2020-11-30 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3108.
---
Fix Version/s: 1.14.0
   Resolution: Fixed

> Tablet server crashes when handle diffscan request 
> ---
>
> Key: KUDU-3108
> URL: https://issues.apache.org/jira/browse/KUDU-3108
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: YifanZhang
>Priority: Major
> Fix For: 1.14.0
>
>
> When we did an incremental backup for tables in a cluster with 20 tservers,  
> 3 tservers crashed, coredump stacks are the same:
> {code:java}
> Unable to find source-code formatter for language: shell. Available languages 
> are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
> groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, 
> perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
> yamlProgram terminated with signal 11, Segmentation fault.Program terminated 
> with signal 11, Segmentation fault.
> #0  kudu::Schema::Compare 
> (this=0x25b883680, lhs=..., rhs=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
> 267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file 
> or directory.
> Missing separate debuginfos, use: debuginfo-install 
> bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 
> cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 
> cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 elfutils-libelf-0.166-2.el7.x86_64 
> elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 
> libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 
> libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 
> libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64 
> ncurses-libs-5.9-13.20130511.el7.x86_64 
> nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 
> openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 
> systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64 
> zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0  kudu::Schema::Compare 
> (this=0x25b883680, lhs=..., rhs=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
> #1  0x01da51fb in kudu::MergeIterator::RefillHotHeap 
> (this=this@entry=0x78f6ec500) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:720
> #2  0x01da622b in kudu::MergeIterator::AdvanceAndReheap 
> (this=this@entry=0x78f6ec500, state=0xd1661a000, 
> num_rows_to_advance=num_rows_to_advance@entry=1)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:690
> #3  0x01da7927 in kudu::MergeIterator::MaterializeOneRow 
> (this=this@entry=0x78f6ec500, dst=dst@entry=0x7f0d5cc9ffc0, 
> dst_row_idx=dst_row_idx@entry=0x7f0d5cc9fbb0)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:894
> #4  0x01da7de3 in kudu::MergeIterator::NextBlock (this=0x78f6ec500, 
> dst=0x7f0d5cc9ffc0) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:796
> #5  0x00a9ff19 in kudu::tablet::Tablet::Iterator::NextBlock 
> (this=, dst=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/tablet.cc:2499
> #6  0x0095475c in 
> kudu::tserver::TabletServiceImpl::HandleContinueScanRequest 
> (this=this@entry=0x53b5a90, req=req@entry=0x7f0d5cca0720,     
> rpc_context=rpc_context@entry=0x5e512a460, 
> result_collector=result_collector@entry=0x7f0d5cca0a00, 
> has_more_results=has_more_results@entry=0x7f0d5cca0886,     
> error_code=error_code@entry=0x7f0d5cca0888) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2565
> #7  0x00966564 in 
> kudu::tserver::TabletServiceImpl::HandleNewScanRequest 
> (this=this@entry=0x53b5a90, replica=0xf5c0189c0, req=req@entry=0x2a15c240,    
>  rpc_context=rpc_context@entry=0x5e512a460, 
> result_collector=result_collector@entry=0x7f0d5cca0a00, 
> scanner_id=scanner_id@entry=0x7f0d5cca0940,     
> snap_timestamp=snap_timestamp@entry=0x7f0d5cca0950, 
> has_more_results=has_more_results@entry=0x7f0d5cca0886, 
> error_code=error_code@entry=0x7f0d5cca0888)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2476
> #8  0x00967f4b in kudu::tserver::TabletServiceImpl::Scan 
> (this=0x53b5a90, req=0x2a15c240, resp=0x56f9be6c0, context=0x5e512a460)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1674
> #9  0x01d2e449 in operator() (__args#2=0x5e512a460, 
> __args#1=0x56f9be6c0, __args#0=, this=0x497ecdd8) at 
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::rpc::GeneratedServiceIf::Handle (this=0x53b5a90, call= out>) at /home/zhangyifan8/work/kudu-xm/src/kudu/rpc/service_if.

[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2020-11-24 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238400#comment-17238400
 ] 

Andrew Wong commented on KUDU-2038:
---

There is a patch for bitmap indexing out, but I don't think it is being 
actively worked on right now: https://gerrit.cloudera.org/c/11722/. It is 
something that I have wanted to revisit, but haven't had the time to prioritize 
recently.

KUDU-3033 is another ticket that I think would be really helpful for reducing 
IO for selective predicates, but again I'm unaware of anyone working on it. If 
you're interested in picking up either feature, I'd be happy to help design and 
review.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Task
>Reporter: Yi Guolei
>Priority: Major
>  Labels: roadmap-candidate
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KUDU-3108) Tablet server crashes when handle diffscan request

2020-11-21 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223996#comment-17223996
 ] 

Andrew Wong edited comment on KUDU-3108 at 11/21/20, 8:24 AM:
--

I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this 
crash with the following sequence.
{code:java}
TEST_F(FuzzTest, Kudu3108) {
  CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema());
  RunFuzzCase({
{TEST_INSERT, 1},
{TEST_FLUSH_OPS},
{TEST_FLUSH_TABLET},
{TEST_INSERT_IGNORE_PK_ONLY, 3},
{TEST_DELETE, 1},
{TEST_FLUSH_OPS},
{TEST_FLUSH_TABLET},
{TEST_UPSERT, 3},
{TEST_UPSERT_PK_ONLY, 1},
{TEST_INSERT, 0},
{TEST_FLUSH_OPS},
{TEST_UPDATE_IGNORE, 0},
{TEST_UPDATE, 3},
{TEST_FLUSH_OPS},
{TEST_DIFF_SCAN, 5, 15},
  });
}
{code}
This results in the following crash:
{code:java}
F1030 21:16:58.411253 40800 schema.h:706] Check failed: 
KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema())
*** Check failure stack trace: ***
*** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are 
using GNU date ***
PC: @ 0x7f701fcf11d7 __GI_raise
*** SIGABRT (@0x11179efd) received by PID 40701 (TID 0x7f6ff0f47700) from 
PID 40701; stack trace: ***
@ 0x7f7026a70370 (unknown)
@ 0x7f701fcf11d7 __GI_raise
@ 0x7f701fcf28c8 __GI_abort
@ 0x7f70224377b9 google::logging_fail()
@ 0x7f7022438f8d google::LogMessage::Fail()
@ 0x7f702243aee3 google::LogMessage::SendToLog()
@ 0x7f7022438ae9 google::LogMessage::Flush()
@ 0x7f702243b86f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f702cc99fbc kudu::Schema::Compare<>()
@ 0x7f7026167cfd kudu::MergeIterator::RefillHotHeap()
@ 0x7f7026167357 kudu::MergeIterator::AdvanceAndReheap()
@ 0x7f7026169617 kudu::MergeIterator::MaterializeOneRow()
@ 0x7f70261688e9 kudu::MergeIterator::NextBlock()
@ 0x7f702cbddd9b kudu::tablet::Tablet::Iterator::NextBlock()
@ 0x7f70317bcab3 
kudu::tserver::TabletServiceImpl::HandleContinueScanRequest()
@ 0x7f70317bb857 
kudu::tserver::TabletServiceImpl::HandleNewScanRequest()
@ 0x7f70317b464e kudu::tserver::TabletServiceImpl::Scan()
@ 0x7f702ddfd762 
_ZZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_
@ 0x7f702de0064d 
_ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_
@ 0x7f702b4ddcc2 std::function<>::operator()()
@ 0x7f702b4dd6ed kudu::rpc::GeneratedServiceIf::Handle()
@ 0x7f702b4dfff8 kudu::rpc::ServicePool::RunThread()
@ 0x7f702b4de8c5 _ZZN4kudu3rpc11ServicePool4InitEiENKUlvE_clEv
@ 0x7f702b4e0337 
_ZNSt17_Function_handlerIFvvEZN4kudu3rpc11ServicePool4InitEiEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x7f7033524b9c std::function<>::operator()()
@ 0x7f70248227e0 kudu::Thread::SuperviseThread()
@ 0x7f7026a68dc5 start_thread
@ 0x7f701fdb376d __clone
Aborted
{code}

I haven't fully grokked this sequence, but I will look into this in the coming 
days.


was (Author: andrew.wong):
I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this 
crash with the following sequence (ignore the \{{-1}}s – their functionality is 
not committed yet).
{code:java}
TEST_F(FuzzTest, Kudu3108) {
  CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema());
  RunFuzzCase({
{TEST_INSERT, 1, -1},
{TEST_FLUSH_OPS, -1},
{TEST_FLUSH_TABLET},
{TEST_INSERT_IGNORE, 3, -1},
{TEST_DELETE, 1},
{TEST_FLUSH_OPS, -1},
{TEST_FLUSH_TABLET},
{TEST_UPSERT, 3},
{TEST_UPSERT_PK_ONLY, 1},
{TEST_INSERT, 0, -1},
{TEST_FLUSH_OPS, -1},
{TEST_UPDATE_IGNORE, 0},
{TEST_UPDATE, 3},
{TEST_FLUSH_OPS, -1},
{TEST_DIFF_SCAN, 5, 15},
  });
}
{code}
This results in the following crash:
{code:java}
F1030 21:16:58.411253 40800 schema.h:706] Check failed: 
KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema())
*** Check failure stack trace: ***
*** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are 
using GNU date ***
PC: @ 0x7f701fcf11d7 __GI_raise
*** SIGABRT (@0x11179efd) received by PID 40701 (TID 0x7f6ff0f47700) from 
PID 40701; stack trace: ***
@ 0x7f7026a70370 (unknown)
@ 0x7f701fcf11d7 __GI_raise
@ 0x7f701fcf28c8 __GI_abort
@ 0x7f70224377b9 google::logging_fail()
@ 0x7f7022438f8d google::LogMessage::Fail()
@ 0x7f702243aee3 google::LogMessage::SendToLog()
@ 0x7f7022438ae9 google::LogMessage::Flush()
@ 0x7f702243b86f

[jira] [Created] (KUDU-3213) Java client should attempt a different tablet server when retrying during tserver quiescing

2020-11-12 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3213:
-

 Summary: Java client should attempt a different tablet server when 
retrying during tserver quiescing
 Key: KUDU-3213
 URL: https://issues.apache.org/jira/browse/KUDU-3213
 Project: Kudu
  Issue Type: Bug
  Components: java
Reporter: Andrew Wong


One of our clusters ran into the following error message when leaving a tablet 
server quiesced for an extended period of time:
{code:java}
ERROR Runner: Pipeline exception occurred: org.apache.spark.SparkException: Job 
aborted due to stage failure: Task 1 in stage 6.0 failed 4 times, most recent 
failure: Lost task 1.3 in stage 6.0 (TID 1922, tserver-018.edh.company.com, 
executor 58): org.apache.kudu.client.NonRecoverableException: cannot complete 
before timeout: ScanRequest(scannerId=null, 
tablet=9e17b554f85f4a7f855771d8b5c913f5, attempt=24, KuduRpc(method=Scan, 
tablet=9e17b554f85f4a7f855771d8b5c913f5, attempt=24, 
DeadlineTracker(timeout=3, elapsed=27988), Traces: [0ms] refreshing cache 
from master, [1ms] Sub RPC GetTableLocations: sending RPC to server 
master-name-003.edh.company.com:7051, [12ms] Sub RPC GetTableLocations: 
received response from server master-name-003.edh.company.com:7051: OK, [22ms] 
sending RPC to server e1a4405443d845249b5ed15c8e882211, [116ms] delaying RPC 
due to: Service unavailable: Tablet server is quiescing (error 0), [117ms] 
received response from server e1a4405443d845249b5ed15c8e882211: Service 
unavailable: Tablet server is quiescing (error 0), [126ms] sending RPC to 
server e1a4405443d845249b5ed15c8e882211, [129ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [129ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [146ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [149ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [149ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [166ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [168ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [168ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [206ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [209ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [209ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [266ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [268ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [268ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [306ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [308ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [308ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [545ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [548ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [548ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [865ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [868ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [868ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [1266ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [1269ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [1269ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [2626ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [2628ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [2628ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [4746ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [4749ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [4749ms] received response 
from server e1a4405443d845249b5ed15c8e882211: Service unavailable: Tablet 
server is quiescing (error 0), [8206ms] sending RPC to server 
e1a4405443d845249b5ed15c8e882211, [8209ms] delaying RPC due to: Service 
unavailable: Tablet server is quiescing (error 0), [8209ms] received response 
from server e1a4405443d845249b5ed15c8e882

[jira] [Comment Edited] (KUDU-3108) Tablet server crashes when handle diffscan request

2020-10-30 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223997#comment-17223997
 ] 

Andrew Wong edited comment on KUDU-3108 at 10/31/20, 4:35 AM:
--

Still not an explanation, but if I change the last row to {{\{TEST_DIFF_SCAN, 
5, 10\}}} the test doesn't pass, but it at least doesn't crash.


was (Author: andrew.wong):
Still not an explanation, but if I change the last row to {{{TEST_DIFF_SCAN, 5, 
10}}} the test doesn't pass, but it at least doesn't crash.

> Tablet server crashes when handle diffscan request 
> ---
>
> Key: KUDU-3108
> URL: https://issues.apache.org/jira/browse/KUDU-3108
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: YifanZhang
>Priority: Major
>
> When we did an incremental backup for tables in a cluster with 20 tservers,  
> 3 tservers crashed, coredump stacks are the same:
> {code:java}
> Unable to find source-code formatter for language: shell. Available languages 
> are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
> groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, 
> perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
> yamlProgram terminated with signal 11, Segmentation fault.Program terminated 
> with signal 11, Segmentation fault.
> #0  kudu::Schema::Compare 
> (this=0x25b883680, lhs=..., rhs=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
> 267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file 
> or directory.
> Missing separate debuginfos, use: debuginfo-install 
> bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 
> cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 
> cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 elfutils-libelf-0.166-2.el7.x86_64 
> elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 
> libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 
> libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 
> libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64 
> ncurses-libs-5.9-13.20130511.el7.x86_64 
> nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 
> openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 
> systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64 
> zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0  kudu::Schema::Compare 
> (this=0x25b883680, lhs=..., rhs=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
> #1  0x01da51fb in kudu::MergeIterator::RefillHotHeap 
> (this=this@entry=0x78f6ec500) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:720
> #2  0x01da622b in kudu::MergeIterator::AdvanceAndReheap 
> (this=this@entry=0x78f6ec500, state=0xd1661a000, 
> num_rows_to_advance=num_rows_to_advance@entry=1)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:690
> #3  0x01da7927 in kudu::MergeIterator::MaterializeOneRow 
> (this=this@entry=0x78f6ec500, dst=dst@entry=0x7f0d5cc9ffc0, 
> dst_row_idx=dst_row_idx@entry=0x7f0d5cc9fbb0)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:894
> #4  0x01da7de3 in kudu::MergeIterator::NextBlock (this=0x78f6ec500, 
> dst=0x7f0d5cc9ffc0) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:796
> #5  0x00a9ff19 in kudu::tablet::Tablet::Iterator::NextBlock 
> (this=, dst=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/tablet.cc:2499
> #6  0x0095475c in 
> kudu::tserver::TabletServiceImpl::HandleContinueScanRequest 
> (this=this@entry=0x53b5a90, req=req@entry=0x7f0d5cca0720,     
> rpc_context=rpc_context@entry=0x5e512a460, 
> result_collector=result_collector@entry=0x7f0d5cca0a00, 
> has_more_results=has_more_results@entry=0x7f0d5cca0886,     
> error_code=error_code@entry=0x7f0d5cca0888) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2565
> #7  0x00966564 in 
> kudu::tserver::TabletServiceImpl::HandleNewScanRequest 
> (this=this@entry=0x53b5a90, replica=0xf5c0189c0, req=req@entry=0x2a15c240,    
>  rpc_context=rpc_context@entry=0x5e512a460, 
> result_collector=result_collector@entry=0x7f0d5cca0a00, 
> scanner_id=scanner_id@entry=0x7f0d5cca0940,     
> snap_timestamp=snap_timestamp@entry=0x7f0d5cca0950, 
> has_more_results=has_more_results@entry=0x7f0d5cca0886, 
> error_code=error_code@entry=0x7f0d5cca0888)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2476
> #8  0x00967f4b in kudu::tserver::TabletServiceImpl::Scan 
> (this=0x53b5a90, req=0x2a15c240, resp=0x56f9be6c0, context=0x5e512a460)    at 
> /home/zhangyifan8/work/

[jira] [Commented] (KUDU-3108) Tablet server crashes when handle diffscan request

2020-10-30 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223997#comment-17223997
 ] 

Andrew Wong commented on KUDU-3108:
---

Still not an explanation, but if I change the last row to {{{TEST_DIFF_SCAN, 5, 
10}}} the test doesn't pass, but it at least doesn't crash.

> Tablet server crashes when handle diffscan request 
> ---
>
> Key: KUDU-3108
> URL: https://issues.apache.org/jira/browse/KUDU-3108
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: YifanZhang
>Priority: Major
>
> When we did an incremental backup for tables in a cluster with 20 tservers,  
> 3 tservers crashed, coredump stacks are the same:
> {code:java}
> Unable to find source-code formatter for language: shell. Available languages 
> are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
> groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, 
> perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
> yamlProgram terminated with signal 11, Segmentation fault.Program terminated 
> with signal 11, Segmentation fault.
> #0  kudu::Schema::Compare 
> (this=0x25b883680, lhs=..., rhs=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
> 267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file 
> or directory.
> Missing separate debuginfos, use: debuginfo-install 
> bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 
> cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 
> cyrus-sasl-plain-2.1.26-20.el7_2.x86_64 elfutils-libelf-0.166-2.el7.x86_64 
> elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7_3.1.x86_64 
> keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 
> libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 
> libcom_err-1.42.9-9.el7.x86_64 libdb-5.3.21-19.el7.x86_64 
> libgcc-4.8.5-28.el7_5.1.x86_64 libselinux-2.5-6.el7.x86_64 
> ncurses-libs-5.9-13.20130511.el7.x86_64 
> nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 
> openssl-libs-1.0.1e-60.el7_3.1.x86_64 pcre-8.32-15.el7_2.1.x86_64 
> systemd-libs-219-30.el7_3.8.x86_64 xz-libs-5.2.2-1.el7.x86_64 
> zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0  kudu::Schema::Compare 
> (this=0x25b883680, lhs=..., rhs=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
> #1  0x01da51fb in kudu::MergeIterator::RefillHotHeap 
> (this=this@entry=0x78f6ec500) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:720
> #2  0x01da622b in kudu::MergeIterator::AdvanceAndReheap 
> (this=this@entry=0x78f6ec500, state=0xd1661a000, 
> num_rows_to_advance=num_rows_to_advance@entry=1)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:690
> #3  0x01da7927 in kudu::MergeIterator::MaterializeOneRow 
> (this=this@entry=0x78f6ec500, dst=dst@entry=0x7f0d5cc9ffc0, 
> dst_row_idx=dst_row_idx@entry=0x7f0d5cc9fbb0)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:894
> #4  0x01da7de3 in kudu::MergeIterator::NextBlock (this=0x78f6ec500, 
> dst=0x7f0d5cc9ffc0) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/generic_iterators.cc:796
> #5  0x00a9ff19 in kudu::tablet::Tablet::Iterator::NextBlock 
> (this=, dst=) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tablet/tablet.cc:2499
> #6  0x0095475c in 
> kudu::tserver::TabletServiceImpl::HandleContinueScanRequest 
> (this=this@entry=0x53b5a90, req=req@entry=0x7f0d5cca0720,     
> rpc_context=rpc_context@entry=0x5e512a460, 
> result_collector=result_collector@entry=0x7f0d5cca0a00, 
> has_more_results=has_more_results@entry=0x7f0d5cca0886,     
> error_code=error_code@entry=0x7f0d5cca0888) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2565
> #7  0x00966564 in 
> kudu::tserver::TabletServiceImpl::HandleNewScanRequest 
> (this=this@entry=0x53b5a90, replica=0xf5c0189c0, req=req@entry=0x2a15c240,    
>  rpc_context=rpc_context@entry=0x5e512a460, 
> result_collector=result_collector@entry=0x7f0d5cca0a00, 
> scanner_id=scanner_id@entry=0x7f0d5cca0940,     
> snap_timestamp=snap_timestamp@entry=0x7f0d5cca0950, 
> has_more_results=has_more_results@entry=0x7f0d5cca0886, 
> error_code=error_code@entry=0x7f0d5cca0888)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:2476
> #8  0x00967f4b in kudu::tserver::TabletServiceImpl::Scan 
> (this=0x53b5a90, req=0x2a15c240, resp=0x56f9be6c0, context=0x5e512a460)    at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/tserver/tablet_service.cc:1674
> #9  0x01d2e449 in operator() (__args#2=0x5e512a460, 
> __args#1=0x56f9be6c0, __args#0=, this=0x497ecdd8) at 
> /usr/include/c++/4.8.2/functional:2471
> #10 kudu::rpc::

[jira] [Comment Edited] (KUDU-3108) Tablet server crashes when handle diffscan request

2020-10-30 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223996#comment-17223996
 ] 

Andrew Wong edited comment on KUDU-3108 at 10/31/20, 4:30 AM:
--

I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this 
crash with the following sequence (ignore the \{{-1}}s – their functionality is 
not committed yet).
{code:java}
TEST_F(FuzzTest, Kudu3108) {
  CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema());
  RunFuzzCase({
{TEST_INSERT, 1, -1},
{TEST_FLUSH_OPS, -1},
{TEST_FLUSH_TABLET},
{TEST_INSERT_IGNORE, 3, -1},
{TEST_DELETE, 1},
{TEST_FLUSH_OPS, -1},
{TEST_FLUSH_TABLET},
{TEST_UPSERT, 3},
{TEST_UPSERT_PK_ONLY, 1},
{TEST_INSERT, 0, -1},
{TEST_FLUSH_OPS, -1},
{TEST_UPDATE_IGNORE, 0},
{TEST_UPDATE, 3},
{TEST_FLUSH_OPS, -1},
{TEST_DIFF_SCAN, 5, 15},
  });
}
{code}
This results in the following crash:
{code:java}
F1030 21:16:58.411253 40800 schema.h:706] Check failed: 
KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema())
*** Check failure stack trace: ***
*** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are 
using GNU date ***
PC: @ 0x7f701fcf11d7 __GI_raise
*** SIGABRT (@0x11179efd) received by PID 40701 (TID 0x7f6ff0f47700) from 
PID 40701; stack trace: ***
@ 0x7f7026a70370 (unknown)
@ 0x7f701fcf11d7 __GI_raise
@ 0x7f701fcf28c8 __GI_abort
@ 0x7f70224377b9 google::logging_fail()
@ 0x7f7022438f8d google::LogMessage::Fail()
@ 0x7f702243aee3 google::LogMessage::SendToLog()
@ 0x7f7022438ae9 google::LogMessage::Flush()
@ 0x7f702243b86f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f702cc99fbc kudu::Schema::Compare<>()
@ 0x7f7026167cfd kudu::MergeIterator::RefillHotHeap()
@ 0x7f7026167357 kudu::MergeIterator::AdvanceAndReheap()
@ 0x7f7026169617 kudu::MergeIterator::MaterializeOneRow()
@ 0x7f70261688e9 kudu::MergeIterator::NextBlock()
@ 0x7f702cbddd9b kudu::tablet::Tablet::Iterator::NextBlock()
@ 0x7f70317bcab3 
kudu::tserver::TabletServiceImpl::HandleContinueScanRequest()
@ 0x7f70317bb857 
kudu::tserver::TabletServiceImpl::HandleNewScanRequest()
@ 0x7f70317b464e kudu::tserver::TabletServiceImpl::Scan()
@ 0x7f702ddfd762 
_ZZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_
@ 0x7f702de0064d 
_ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_
@ 0x7f702b4ddcc2 std::function<>::operator()()
@ 0x7f702b4dd6ed kudu::rpc::GeneratedServiceIf::Handle()
@ 0x7f702b4dfff8 kudu::rpc::ServicePool::RunThread()
@ 0x7f702b4de8c5 _ZZN4kudu3rpc11ServicePool4InitEiENKUlvE_clEv
@ 0x7f702b4e0337 
_ZNSt17_Function_handlerIFvvEZN4kudu3rpc11ServicePool4InitEiEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x7f7033524b9c std::function<>::operator()()
@ 0x7f70248227e0 kudu::Thread::SuperviseThread()
@ 0x7f7026a68dc5 start_thread
@ 0x7f701fdb376d __clone
Aborted
{code}

I haven't fully grokked this sequence, but I will look into this in the coming 
days.


was (Author: andrew.wong):
I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this 
crash with the following sequence (ignore the \{{-1}}s – their functionality is 
not committed yet).
{code:java}
TEST_F(FuzzTest, Kudu3108) {
  CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema());
  RunFuzzCase({
{TEST_INSERT, 1, -1},
{TEST_FLUSH_OPS, -1},
{TEST_FLUSH_TABLET},
{TEST_INSERT_IGNORE, 3, -1},
{TEST_DELETE, 1},
{TEST_FLUSH_OPS, -1},
{TEST_FLUSH_TABLET},
{TEST_UPSERT, 3},
{TEST_UPSERT_PK_ONLY, 1},
{TEST_INSERT, 0, -1},
{TEST_FLUSH_OPS, -1},
{TEST_UPDATE_IGNORE, 0},
{TEST_UPDATE, 3},
{TEST_FLUSH_OPS, -1},
{TEST_UPSERT_PK_ONLY, 1},
{TEST_UPSERT, 3},
{TEST_INSERT, 2, -1},
{TEST_DIFF_SCAN, 5, 15},
  });
}
{code}
This results in the following crash:
{code:java}
F1030 21:16:58.411253 40800 schema.h:706] Check failed: 
KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema())
*** Check failure stack trace: ***
*** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are 
using GNU date ***
PC: @ 0x7f701fcf11d7 __GI_raise
*** SIGABRT (@0x11179efd) received by PID 40701 (TID 0x7f6ff0f47700) from 
PID 40701; stack trace: ***
@ 0x7f7026a70370 (unknown)
@ 0x7f701fcf11d7 __GI_raise
@ 0x7f701fcf28c8 __GI_abort
@ 0x7f70224377b9 google::logging_fail()
@ 0x7f7022438f

[jira] [Commented] (KUDU-3108) Tablet server crashes when handle diffscan request

2020-10-30 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223996#comment-17223996
 ] 

Andrew Wong commented on KUDU-3108:
---

I've been doing some fuzz testing using {{fuzz-itest.cc}} and reproduced this 
crash with the following sequence (ignore the \{{-1}}s – their functionality is 
not committed yet).
{code:java}
TEST_F(FuzzTest, Kudu3108) {
  CreateTabletAndStartClusterWithSchema(CreateKeyValueTestSchema());
  RunFuzzCase({
{TEST_INSERT, 1, -1},
{TEST_FLUSH_OPS, -1},
{TEST_FLUSH_TABLET},
{TEST_INSERT_IGNORE, 3, -1},
{TEST_DELETE, 1},
{TEST_FLUSH_OPS, -1},
{TEST_FLUSH_TABLET},
{TEST_UPSERT, 3},
{TEST_UPSERT_PK_ONLY, 1},
{TEST_INSERT, 0, -1},
{TEST_FLUSH_OPS, -1},
{TEST_UPDATE_IGNORE, 0},
{TEST_UPDATE, 3},
{TEST_FLUSH_OPS, -1},
{TEST_UPSERT_PK_ONLY, 1},
{TEST_UPSERT, 3},
{TEST_INSERT, 2, -1},
{TEST_DIFF_SCAN, 5, 15},
  });
}
{code}
This results in the following crash:
{code:java}
F1030 21:16:58.411253 40800 schema.h:706] Check failed: 
KeyEquals(*lhs.schema()) && KeyEquals(*rhs.schema())
*** Check failure stack trace: ***
*** Aborted at 1604117818 (unix time) try "date -d @1604117818" if you are 
using GNU date ***
PC: @ 0x7f701fcf11d7 __GI_raise
*** SIGABRT (@0x11179efd) received by PID 40701 (TID 0x7f6ff0f47700) from 
PID 40701; stack trace: ***
@ 0x7f7026a70370 (unknown)
@ 0x7f701fcf11d7 __GI_raise
@ 0x7f701fcf28c8 __GI_abort
@ 0x7f70224377b9 google::logging_fail()
@ 0x7f7022438f8d google::LogMessage::Fail()
@ 0x7f702243aee3 google::LogMessage::SendToLog()
@ 0x7f7022438ae9 google::LogMessage::Flush()
@ 0x7f702243b86f google::LogMessageFatal::~LogMessageFatal()
@ 0x7f702cc99fbc kudu::Schema::Compare<>()
@ 0x7f7026167cfd kudu::MergeIterator::RefillHotHeap()
@ 0x7f7026167357 kudu::MergeIterator::AdvanceAndReheap()
@ 0x7f7026169617 kudu::MergeIterator::MaterializeOneRow()
@ 0x7f70261688e9 kudu::MergeIterator::NextBlock()
@ 0x7f702cbddd9b kudu::tablet::Tablet::Iterator::NextBlock()
@ 0x7f70317bcab3 
kudu::tserver::TabletServiceImpl::HandleContinueScanRequest()
@ 0x7f70317bb857 
kudu::tserver::TabletServiceImpl::HandleNewScanRequest()
@ 0x7f70317b464e kudu::tserver::TabletServiceImpl::Scan()
@ 0x7f702ddfd762 
_ZZN4kudu7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS_12MetricEntityEERKS2_INS_3rpc13ResultTrackerEEENKUlPKN6google8protobuf7MessageEPSE_PNS7_10RpcContextEE4_clESG_SH_SJ_
@ 0x7f702de0064d 
_ZNSt17_Function_handlerIFvPKN6google8protobuf7MessageEPS2_PN4kudu3rpc10RpcContextEEZNS6_7tserver21TabletServerServiceIfC1ERK13scoped_refptrINS6_12MetricEntityEERKSD_INS7_13ResultTrackerEEEUlS4_S5_S9_E4_E9_M_invokeERKSt9_Any_dataS4_S5_S9_
@ 0x7f702b4ddcc2 std::function<>::operator()()
@ 0x7f702b4dd6ed kudu::rpc::GeneratedServiceIf::Handle()
@ 0x7f702b4dfff8 kudu::rpc::ServicePool::RunThread()
@ 0x7f702b4de8c5 _ZZN4kudu3rpc11ServicePool4InitEiENKUlvE_clEv
@ 0x7f702b4e0337 
_ZNSt17_Function_handlerIFvvEZN4kudu3rpc11ServicePool4InitEiEUlvE_E9_M_invokeERKSt9_Any_data
@ 0x7f7033524b9c std::function<>::operator()()
@ 0x7f70248227e0 kudu::Thread::SuperviseThread()
@ 0x7f7026a68dc5 start_thread
@ 0x7f701fdb376d __clone
Aborted
{code}

I haven't fully grokked this sequence, but I will look into this in the coming 
days.

> Tablet server crashes when handle diffscan request 
> ---
>
> Key: KUDU-3108
> URL: https://issues.apache.org/jira/browse/KUDU-3108
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: YifanZhang
>Priority: Major
>
> When we did an incremental backup for tables in a cluster with 20 tservers,  
> 3 tservers crashed, coredump stacks are the same:
> {code:java}
> Unable to find source-code formatter for language: shell. Available languages 
> are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
> groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, 
> perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
> yamlProgram terminated with signal 11, Segmentation fault.Program terminated 
> with signal 11, Segmentation fault.
> #0  kudu::Schema::Compare 
> (this=0x25b883680, lhs=..., rhs=...) at 
> /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h:267
> 267 /home/zhangyifan8/work/kudu-xm/src/kudu/common/rowblock.h: No such file 
> or directory.
> Missing separate debuginfos, use: debuginfo-install 
> bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-gssapi-2.1.26-20.el7_2.x86_64 
> cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 cyrus-sasl-md5-2.1.26-20.el7_2.x86_64 
> cyrus-sasl-plai

[jira] [Created] (KUDU-3209) Allow decommissioning tool to run without rebalancing the rest of the cluster

2020-10-28 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3209:
-

 Summary: Allow decommissioning tool to run without rebalancing the 
rest of the cluster
 Key: KUDU-3209
 URL: https://issues.apache.org/jira/browse/KUDU-3209
 Project: Kudu
  Issue Type: Improvement
  Components: CLI, ops-tooling
Reporter: Andrew Wong


Currently when specifying {{--move_replicas_from_ignored_tservers}} to the 
rebalancer tool, the tool first empties the ignored tablet servers, and then 
runs rebalancing of the rest of the cluster. While true to its name as the 
"rebalancer tool", this tight coupling isn't always desired, especially given 
how heavy-weight a full cluster rebalancing can be upon first usage.

It'd be nice if users could specify some {{--empty_ignored_tservers_only}} flag 
that made no attempt at further rebalancing once decommissioning completes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3204) Add a metric for amount of available space

2020-10-21 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3204:
-

 Summary: Add a metric for amount of available space
 Key: KUDU-3204
 URL: https://issues.apache.org/jira/browse/KUDU-3204
 Project: Kudu
  Issue Type: Improvement
  Components: fs, metrics
Reporter: Andrew Wong


It'd be convenient to expose a metric for how much space there is available on 
each tablet server (accounting for {{fs_wal_dir_reserved_bytes}} and 
{{fs_data_dirs_reserved_bytes}}). This would be useful in implementing a 
replica placement policy based on available space.

It's probably worth separating metrics for available WAL directory space and 
available data directory space. E.g. we may want to treat a lack of space 
differently depending on whether we are limited on WAL space and not limited on 
data space, and vice versa.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3149) Lock contention between registering ops and computing maintenance op stats

2020-10-19 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3149:
--
Fix Version/s: 1.14.0
   Resolution: Fixed
   Status: Resolved  (was: In Review)

> Lock contention between registering ops and computing maintenance op stats
> --
>
> Key: KUDU-3149
> URL: https://issues.apache.org/jira/browse/KUDU-3149
> Project: Kudu
>  Issue Type: Bug
>  Components: perf, tserver
>Reporter: Andrew Wong
>Priority: Critical
> Fix For: 1.14.0
>
>
> We saw a bunch of tablets bootstrapping extremely slowly, and many stuck 
> supposedly bootstrapping, but not showing up in the {{/tablets}} page, i.e. 
> we could only see INITIALIZED and RUNNING tablets, no BOOTSTRAPPING.
> Upon digging into the stacks, we saw a bunch waiting to acquire the MM lock:
> {code:java}
> TID 46577(tablet-open [wo):
> @ 0x7f1dd57147e0  (unknown)
> @ 0x7f1dd5713332  (unknown)
> @ 0x7f1dd570e5d8  (unknown)
> @ 0x7f1dd570e4a7  (unknown)
> @  0x23b4058  kudu::Mutex::Acquire()
> @  0x23980ff  kudu::MaintenanceManager::RegisterOp()
> @   0xb59b99  kudu::tablet::Tablet::RegisterMaintenanceOps()
> @   0xb855a1  
> kudu::tablet::TabletReplica::RegisterMaintenanceOps()
> @   0xa0055b  kudu::tserver::TSTabletManager::OpenTablet()
> @  0x23f994c  kudu::ThreadPool::DispatchThread()
> @  0x23f3f8b  kudu::Thread::SuperviseThread()
> @ 0x7f1dd570caa1  (unknown)
> @ 0x7f1dd3b18bcd  (unknown)
> TID 46574(tablet-open [wo):
> @ 0x7f1dd57147e0  (unknown)
> @ 0x7f1dd5713332  (unknown)
> @ 0x7f1dd570e5d8  (unknown)
> @ 0x7f1dd570e4a7  (unknown)
> @  0x23b4058  kudu::Mutex::Acquire()
> @  0x23980ff  kudu::MaintenanceManager::RegisterOp()
> @   0xb59c74  kudu::tablet::Tablet::RegisterMaintenanceOps()
> @   0xb855a1  
> kudu::tablet::TabletReplica::RegisterMaintenanceOps()
> @   0xa0055b  kudu::tserver::TSTabletManager::OpenTablet()
> @  0x23f994c  kudu::ThreadPool::DispatchThread()
> @  0x23f3f8b  kudu::Thread::SuperviseThread()
> @ 0x7f1dd570caa1  (unknown)
> @ 0x7f1dd3b18bcd  (unknown)
> 7 threads with same stack:
> TID 46575(tablet-open [wo):
> TID 46576(tablet-open [wo):
> TID 46578(tablet-open [wo):
> TID 46580(tablet-open [wo):
> TID 46581(tablet-open [wo):
> TID 46582(tablet-open [wo):
> TID 46583(tablet-open [wo):
> @ 0x7f1dd57147e0  (unknown)
> @ 0x7f1dd5713332  (unknown)
> @ 0x7f1dd570e5d8  (unknown)
> @ 0x7f1dd570e4a7  (unknown)
> @  0x23b4058  kudu::Mutex::Acquire()
> @  0x23980ff  kudu::MaintenanceManager::RegisterOp()
> @   0xb85374  
> kudu::tablet::TabletReplica::RegisterMaintenanceOps()
> @   0xa0055b  kudu::tserver::TSTabletManager::OpenTablet()
> @  0x23f994c  kudu::ThreadPool::DispatchThread()
> @  0x23f3f8b  kudu::Thread::SuperviseThread()
> @ 0x7f1dd570caa1  (unknown)
> @ 0x7f1dd3b18bcd  (unknown)
> TID 46573(tablet-open [wo):
> @ 0x7f1dd57147e0  (unknown)
> @ 0x7f1dd5713332  (unknown)
> @ 0x7f1dd570e5d8  (unknown)
> @ 0x7f1dd570e4a7  (unknown)
> @  0x23b4058  kudu::Mutex::Acquire()
> @  0x23980ff  kudu::MaintenanceManager::RegisterOp()
> @   0xb854c7  
> kudu::tablet::TabletReplica::RegisterMaintenanceOps()
> @   0xa0055b  kudu::tserver::TSTabletManager::OpenTablet()
> @  0x23f994c  kudu::ThreadPool::DispatchThread()
> @  0x23f3f8b  kudu::Thread::SuperviseThread()
> @ 0x7f1dd570caa1  (unknown)
> @ 0x7f1dd3b18bcd  (unknown)
> 2 threads with same stack:
> TID 43795(MaintenanceMgr ):
> TID 43796(MaintenanceMgr ):
> @ 0x7f1dd57147e0  (unknown)
> @ 0x7f1dd5713332  (unknown)
> @ 0x7f1dd570e5d8  (unknown)
> @ 0x7f1dd570e4a7  (unknown)
> @  0x23b4058  kudu::Mutex::Acquire()
> @  0x239a064  kudu::MaintenanceManager::LaunchOp()
> @  0x23f994c  kudu::ThreadPool::DispatchThread()
> @  0x23f3f8b  kudu::Thread::SuperviseThread()
> @ 0x7f1dd570caa1  (unknown)
> @ 0x7f1dd3b18bcd  (unknown)
> {code}
> A couple more stacks show some work being done by the maintenance manager:
> {code:java}
> TID 43794(MaintenanceMgr ):
> @ 0x7f1dd57147e0  (unknown)
> @   0xba7b41  
> kudu::tablet::BudgetedCompactionPolicy::RunApproximation()
> @   0xba8f5d  
> kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
> @   0xb5b1a1  kudu::tablet::Tablet::Pic

[jira] [Assigned] (KUDU-3149) Lock contention between registering ops and computing maintenance op stats

2020-10-19 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-3149:
-

Assignee: Andrew Wong

> Lock contention between registering ops and computing maintenance op stats
> --
>
> Key: KUDU-3149
> URL: https://issues.apache.org/jira/browse/KUDU-3149
> Project: Kudu
>  Issue Type: Bug
>  Components: perf, tserver
>Reporter: Andrew Wong
>Assignee: Andrew Wong
>Priority: Critical
> Fix For: 1.14.0
>
>
> We saw a bunch of tablets bootstrapping extremely slowly, and many stuck 
> supposedly bootstrapping, but not showing up in the {{/tablets}} page, i.e. 
> we could only see INITIALIZED and RUNNING tablets, no BOOTSTRAPPING.
> Upon digging into the stacks, we saw a bunch waiting to acquire the MM lock:
> {code:java}
> TID 46577(tablet-open [wo):
> @ 0x7f1dd57147e0  (unknown)
> @ 0x7f1dd5713332  (unknown)
> @ 0x7f1dd570e5d8  (unknown)
> @ 0x7f1dd570e4a7  (unknown)
> @  0x23b4058  kudu::Mutex::Acquire()
> @  0x23980ff  kudu::MaintenanceManager::RegisterOp()
> @   0xb59b99  kudu::tablet::Tablet::RegisterMaintenanceOps()
> @   0xb855a1  
> kudu::tablet::TabletReplica::RegisterMaintenanceOps()
> @   0xa0055b  kudu::tserver::TSTabletManager::OpenTablet()
> @  0x23f994c  kudu::ThreadPool::DispatchThread()
> @  0x23f3f8b  kudu::Thread::SuperviseThread()
> @ 0x7f1dd570caa1  (unknown)
> @ 0x7f1dd3b18bcd  (unknown)
> TID 46574(tablet-open [wo):
> @ 0x7f1dd57147e0  (unknown)
> @ 0x7f1dd5713332  (unknown)
> @ 0x7f1dd570e5d8  (unknown)
> @ 0x7f1dd570e4a7  (unknown)
> @  0x23b4058  kudu::Mutex::Acquire()
> @  0x23980ff  kudu::MaintenanceManager::RegisterOp()
> @   0xb59c74  kudu::tablet::Tablet::RegisterMaintenanceOps()
> @   0xb855a1  
> kudu::tablet::TabletReplica::RegisterMaintenanceOps()
> @   0xa0055b  kudu::tserver::TSTabletManager::OpenTablet()
> @  0x23f994c  kudu::ThreadPool::DispatchThread()
> @  0x23f3f8b  kudu::Thread::SuperviseThread()
> @ 0x7f1dd570caa1  (unknown)
> @ 0x7f1dd3b18bcd  (unknown)
> 7 threads with same stack:
> TID 46575(tablet-open [wo):
> TID 46576(tablet-open [wo):
> TID 46578(tablet-open [wo):
> TID 46580(tablet-open [wo):
> TID 46581(tablet-open [wo):
> TID 46582(tablet-open [wo):
> TID 46583(tablet-open [wo):
> @ 0x7f1dd57147e0  (unknown)
> @ 0x7f1dd5713332  (unknown)
> @ 0x7f1dd570e5d8  (unknown)
> @ 0x7f1dd570e4a7  (unknown)
> @  0x23b4058  kudu::Mutex::Acquire()
> @  0x23980ff  kudu::MaintenanceManager::RegisterOp()
> @   0xb85374  
> kudu::tablet::TabletReplica::RegisterMaintenanceOps()
> @   0xa0055b  kudu::tserver::TSTabletManager::OpenTablet()
> @  0x23f994c  kudu::ThreadPool::DispatchThread()
> @  0x23f3f8b  kudu::Thread::SuperviseThread()
> @ 0x7f1dd570caa1  (unknown)
> @ 0x7f1dd3b18bcd  (unknown)
> TID 46573(tablet-open [wo):
> @ 0x7f1dd57147e0  (unknown)
> @ 0x7f1dd5713332  (unknown)
> @ 0x7f1dd570e5d8  (unknown)
> @ 0x7f1dd570e4a7  (unknown)
> @  0x23b4058  kudu::Mutex::Acquire()
> @  0x23980ff  kudu::MaintenanceManager::RegisterOp()
> @   0xb854c7  
> kudu::tablet::TabletReplica::RegisterMaintenanceOps()
> @   0xa0055b  kudu::tserver::TSTabletManager::OpenTablet()
> @  0x23f994c  kudu::ThreadPool::DispatchThread()
> @  0x23f3f8b  kudu::Thread::SuperviseThread()
> @ 0x7f1dd570caa1  (unknown)
> @ 0x7f1dd3b18bcd  (unknown)
> 2 threads with same stack:
> TID 43795(MaintenanceMgr ):
> TID 43796(MaintenanceMgr ):
> @ 0x7f1dd57147e0  (unknown)
> @ 0x7f1dd5713332  (unknown)
> @ 0x7f1dd570e5d8  (unknown)
> @ 0x7f1dd570e4a7  (unknown)
> @  0x23b4058  kudu::Mutex::Acquire()
> @  0x239a064  kudu::MaintenanceManager::LaunchOp()
> @  0x23f994c  kudu::ThreadPool::DispatchThread()
> @  0x23f3f8b  kudu::Thread::SuperviseThread()
> @ 0x7f1dd570caa1  (unknown)
> @ 0x7f1dd3b18bcd  (unknown)
> {code}
> A couple more stacks show some work being done by the maintenance manager:
> {code:java}
> TID 43794(MaintenanceMgr ):
> @ 0x7f1dd57147e0  (unknown)
> @   0xba7b41  
> kudu::tablet::BudgetedCompactionPolicy::RunApproximation()
> @   0xba8f5d  
> kudu::tablet::BudgetedCompactionPolicy::PickRowSets()
> @   0xb5b1a1  kudu::tablet::Tablet::PickRowSetsToCompact()
> @  

[jira] [Created] (KUDU-3203) Allow clients to support reading decimals with wider bit-width

2020-10-19 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3203:
-

 Summary: Allow clients to support reading decimals with wider 
bit-width
 Key: KUDU-3203
 URL: https://issues.apache.org/jira/browse/KUDU-3203
 Project: Kudu
  Issue Type: Improvement
  Components: client
Reporter: Andrew Wong


Today, decimal bit-width is entirely determined by Kudu. When creating a schema 
of a given precision and scale, Kudu determines the correct bit-width for the 
parameters, and uses that to store values.

Client scanners can only specify reading DECIMAL (ignorant of bit-width). In 
requesting the columnar layout, however, it'd be nice if client scanners could 
also specify the desired bit-width to get back from tservers, and have the 
tservers inflate values as appropriate. This would be helpful, e.g. to read 
DECIMAL32- and DECIMAL64-stored data in Arrow, which currently only supports 
DECIMAL128 and DECIMAL256.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3071) Expose splitKeyRanges in the C++ clients

2020-10-11 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3071.
---
Fix Version/s: n/a
   Resolution: Won't Fix

Impala can actually leverage the split key ranges feature already because it 
would (and does, following IMPALA-9792) generate tokens from its frontend.

> Expose splitKeyRanges in the C++ clients
> 
>
> Key: KUDU-3071
> URL: https://issues.apache.org/jira/browse/KUDU-3071
> Project: Kudu
>  Issue Type: Improvement
>  Components: client, perf
>Reporter: Andrew Wong
>Priority: Major
> Fix For: n/a
>
>
> KUDU-2437 introduced the server-side ability to return "split keys" that 
> logically divide a given tablet into key ranges. KUDU-2670 introduced an 
> improvement in the Spark integration's KuduRDD to allow for Spark to use this 
> generate smaller-scoped scan tokens that each scan a chunk of a tablet 
> instead of entire tablets. This decoupled the a table's partitioning scheme 
> from its read concurrency limitations.
> It'd be great if we could expose chunked-token-hydration in the C++ client so 
> that Impala can begin generating these chunked tokens and then hydrating them 
> into smaller scanners in its backend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3161) Include FileSystem Path in UUID Mismatch Error

2020-10-11 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3161:
--
Labels: newbie  (was: )

> Include FileSystem Path in UUID Mismatch Error
> --
>
> Key: KUDU-3161
> URL: https://issues.apache.org/jira/browse/KUDU-3161
> Project: Kudu
>  Issue Type: Improvement
>Reporter: David Mollitor
>Priority: Minor
>  Labels: newbie
>
> {code:none}
> Check failed: _s.ok() Bad status: Corruption: Failed to load FS layout: 
> Mismatched UUIDs across filesystem roots: 2935c5f89ee45654bfcf3bf67569f11f 
> vs. d7af50d73dae4fa38de386bc583cab22; configuring multiple Kudu processes 
> with the same directory is not supported
> {code}
> Please enhance this logging to dump the UUID and location of each file system 
> so that the problematic directory(ies) can be quickly determined.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KUDU-3197) Tablet keeps all history schemas in memory may result in high memory consumption

2020-10-09 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211340#comment-17211340
 ] 

Andrew Wong edited comment on KUDU-3197 at 10/9/20, 8:16 PM:
-

You're right in that the scanner creates a schema to represent its projection. 
However, the underlying iterators may take references to the current schema 
while iterating, the tablet service might take references while preparing to 
scan, etc. While most if not all of these accesses are short-lived, we need to 
be careful not to destruct the schemas while these references still exist.

Grepping around to audit current usages (with some false positives for copies 
and log messages):
{code:java}
~/Repositories/kudu/src/kudu > grep -r "meta.*[^_]schema()" .
./tablet/tablet-schema-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet-schema-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet-schema-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet-schema-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet-schema-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet_metadata.cc:if (!(*metadata)->schema().Equals(schema)) {
./tablet/tablet_metadata.cc:"match expected schema ($1)", 
(*metadata)->schema().ToString(),
./tablet/diff_scan-test.cc:  SchemaBuilder 
builder(tablet->metadata()->schema());
./tablet/tablet_replica-test.cc:  
ASSERT_OK(SchemaToPB(SchemaBuilder(tablet()->metadata()->schema()).Build(), 
&orig_schema_pb));
./tablet/tablet_replica-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet_replica-test.cc:  
ASSERT_OK(SchemaToPB(SchemaBuilder(tablet()->metadata()->schema()).Build(), 
&orig_schema_pb));
./tablet/tablet_replica-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet.cc:  : key_schema_(metadata->schema().CreateKeyProjection()),
./tablet/tablet.cc:  metadata_->SetSchema(*op_state->schema(), 
op_state->schema_version());
./tablet/tablet.cc:  RollingDiskRowSetWriter drsw(metadata_.get(), 
merge->schema(), DefaultBloomSizing(),
./tablet/rowset_metadata.h:return tablet_metadata_->schema();
./tablet/tablet.h:return &metadata_->schema();
./tablet/all_types-scan-correctness-test.cc:SchemaBuilder 
builder(tablet()->metadata()->schema());
./tools/kudu-tool-test.cc:  .PartitionDebugString(meta->partition(), 
meta->schema());
./tools/kudu-tool-test.cc:  debug_str = meta->schema().ToString();
./tools/kudu-tool-test.cc:.PartitionDebugString(meta->partition(), 
meta->schema());
./tools/kudu-tool-test.cc:debug_str = meta->schema().ToString();
./tools/tool_action_local_replica.cc:const auto& col_idx = 
meta->schema().find_column_by_id(col_id);
./tools/tool_action_local_replica.cc:
meta->schema().column(col_idx).name() : "?");
./tools/tool_action_local_replica.cc:  Schema schema = meta->schema();
./tools/tool_action_local_replica.cc:  const Schema& schema = meta->schema();
./tools/tool_action_local_replica.cc:   
 meta->schema())
./tserver/tablet_service.cc:Schema tablet_schema = 
replica->tablet_metadata()->schema();
./tserver/tablet_service.cc:  const auto& schema = 
replica->tablet_metadata()->schema();
./tserver/tablet_service.cc:  
CHECK_OK(SchemaToPB(replica->tablet_metadata()->schema(),
./tserver/tablet_service.cc:  const auto& schema = 
replica->tablet_metadata()->schema();
./tserver/tablet_service.cc:  Schema tablet_schema = 
replica->tablet_metadata()->schema();
./tserver/tablet_service.cc:  const auto& schema = 
replica->tablet_metadata()->schema();
./tserver/tablet_service.cc:  const Schema& tablet_schema = 
replica->tablet_metadata()->schema();
./tserver/scanners.cc:
spec().lower_bound_key()->Stringify(tablet_metadata->schema();
./tserver/scanners.cc:
spec().exclusive_upper_bound_key()->Stringify(tablet_metadata->schema();
./tserver/tserver_path_handlers.cc: 
tmeta->schema());
./tserver/tserver_path_handlers.cc:  const Schema& schema = tmeta->schema();
./master/sys_catalog.cc:  if (!metadata->schema().Equals(BuildTableSchema())) {
./master/sys_catalog.cc:return(Status::Corruption("Unexpected schema", 
metadata->schema().ToString()));
./client/scan_token-internal.cc:
RETURN_NOT_OK(SchemaFromPB(metadata.schema(), &schema));
./client/client-test.cc:Schema schema = 
tablet_replica->tablet()->metadata()->schema();
./client/client-test.cc:Schema schema = 
tablet_replica->tablet()->metadata()->schema();
./client/client-test.cc:Schema schema = 
tablet_replica->tablet()->metadata()->schema();
./client/client-test.cc:Schema schema = 
tablet_replica->tablet()->metada

[jira] [Commented] (KUDU-3197) Tablet keeps all history schemas in memory may result in high memory consumption

2020-10-09 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211340#comment-17211340
 ] 

Andrew Wong commented on KUDU-3197:
---

You're right in that the scanner creates a schema to represent its projection. 
However, the underlying iterators may take references to the current schema 
while iterating, the tablet service might take references while preparing to 
scan, etc. While most if not all of these accesses are short-lived, we need to 
be careful not to destruct the schemas while these references still exist.

Grepping around to audit current usages (with some false positives for copies 
and log messages):
{code:java}
~/Repositories/kudu/src/kudu > grep -r "meta.*[^_]schema()" .
./tablet/tablet-schema-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet-schema-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet-schema-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet-schema-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet-schema-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet_metadata.cc:if (!(*metadata)->schema().Equals(schema)) {
./tablet/tablet_metadata.cc:"match expected schema ($1)", 
(*metadata)->schema().ToString(),
./tablet/diff_scan-test.cc:  SchemaBuilder 
builder(tablet->metadata()->schema());
./tablet/tablet_replica-test.cc:  
ASSERT_OK(SchemaToPB(SchemaBuilder(tablet()->metadata()->schema()).Build(), 
&orig_schema_pb));
./tablet/tablet_replica-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet_replica-test.cc:  
ASSERT_OK(SchemaToPB(SchemaBuilder(tablet()->metadata()->schema()).Build(), 
&orig_schema_pb));
./tablet/tablet_replica-test.cc:  SchemaBuilder 
builder(tablet()->metadata()->schema());
./tablet/tablet.cc:  : key_schema_(metadata->schema().CreateKeyProjection()),
./tablet/tablet.cc:  metadata_->SetSchema(*op_state->schema(), 
op_state->schema_version());
./tablet/tablet.cc:  RollingDiskRowSetWriter drsw(metadata_.get(), 
merge->schema(), DefaultBloomSizing(),
./tablet/rowset_metadata.h:return tablet_metadata_->schema();
./tablet/tablet.h:return &metadata_->schema();
./tablet/all_types-scan-correctness-test.cc:SchemaBuilder 
builder(tablet()->metadata()->schema());
./tools/kudu-tool-test.cc:  .PartitionDebugString(meta->partition(), 
meta->schema());
./tools/kudu-tool-test.cc:  debug_str = meta->schema().ToString();
./tools/kudu-tool-test.cc:.PartitionDebugString(meta->partition(), 
meta->schema());
./tools/kudu-tool-test.cc:debug_str = meta->schema().ToString();
./tools/tool_action_local_replica.cc:const auto& col_idx = 
meta->schema().find_column_by_id(col_id);
./tools/tool_action_local_replica.cc:
meta->schema().column(col_idx).name() : "?");
./tools/tool_action_local_replica.cc:  Schema schema = meta->schema();
./tools/tool_action_local_replica.cc:  const Schema& schema = meta->schema();
./tools/tool_action_local_replica.cc:   
 meta->schema())
./tserver/tablet_service.cc:Schema tablet_schema = 
replica->tablet_metadata()->schema();
./tserver/tablet_service.cc:  const auto& schema = 
replica->tablet_metadata()->schema();
./tserver/tablet_service.cc:  
CHECK_OK(SchemaToPB(replica->tablet_metadata()->schema(),
./tserver/tablet_service.cc:  const auto& schema = 
replica->tablet_metadata()->schema();
./tserver/tablet_service.cc:  Schema tablet_schema = 
replica->tablet_metadata()->schema();
./tserver/tablet_service.cc:  const auto& schema = 
replica->tablet_metadata()->schema();
./tserver/tablet_service.cc:  const Schema& tablet_schema = 
replica->tablet_metadata()->schema();
./tserver/scanners.cc:
spec().lower_bound_key()->Stringify(tablet_metadata->schema();
./tserver/scanners.cc:
spec().exclusive_upper_bound_key()->Stringify(tablet_metadata->schema();
./tserver/tserver_path_handlers.cc: 
tmeta->schema());
./tserver/tserver_path_handlers.cc:  const Schema& schema = tmeta->schema();
./master/sys_catalog.cc:  if (!metadata->schema().Equals(BuildTableSchema())) {
./master/sys_catalog.cc:return(Status::Corruption("Unexpected schema", 
metadata->schema().ToString()));
./client/scan_token-internal.cc:
RETURN_NOT_OK(SchemaFromPB(metadata.schema(), &schema));
./client/client-test.cc:Schema schema = 
tablet_replica->tablet()->metadata()->schema();
./client/client-test.cc:Schema schema = 
tablet_replica->tablet()->metadata()->schema();
./client/client-test.cc:Schema schema = 
tablet_replica->tablet()->metadata()->schema();
./client/client-test.cc:Schema schema = 
tablet_replica->tablet()->metadata()->schema();
./client/client-test.cc:Schema 

[jira] [Commented] (KUDU-3197) Tablet keeps all history schemas in memory may result in high memory consumption

2020-09-28 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203521#comment-17203521
 ] 

Andrew Wong commented on KUDU-3197:
---

Yeah, if I understand the concerns about old schemas here, the proposed 
approach seems pretty unsafe. If a scan lasts longer than the time it takes to 
update the schema 20 times, we might hit a segfault.

After a brief look around for how the tablet metadatas' schemas are used on the 
tablet servers, for a ref-counting solution, it's probably worth identifying 
the top-level "owners" of the current schema pointers, i.e. the current callers 
of {{TabletMetadata::schema()}}, and ensuring that any references that those 
owners pass around either outlive the owners themselves or are also ref-counted.

> Tablet keeps all history schemas in memory may result in high memory 
> consumption
> 
>
> Key: KUDU-3197
> URL: https://issues.apache.org/jira/browse/KUDU-3197
> Project: Kudu
>  Issue Type: Improvement
>  Components: tablet
>Affects Versions: 1.12.0
>Reporter: wangningito
>Assignee: wangningito
>Priority: Minor
> Attachments: image-2020-09-25-14-45-33-402.png, 
> image-2020-09-25-14-49-30-913.png, image-2020-09-25-15-05-44-948.png
>
>
> In case of high frequency of updating table, memory consumption of 
> kudu-tserver may be very high, and the memory in not tracked in the memory 
> page. 
> This is the memory usage of a tablet, the memory consumption of tablet-xxx‘s 
> peak is 3.6G, but none of its' childrens' memory can reach.
> !image-2020-09-25-14-45-33-402.png!
> So I use pprof to get the heap sampling. The tserver started for long but the 
> memory is still consuming by TabletBootstrap:PlayAlterSchemaRequest. 
> !image-2020-09-25-14-49-30-913.png!
> I change the `old_schemas_` in tablet_metadata.h to a fixed size vector, 
>     // Previous values of 'schema_'.
> // These are currently kept alive forever, under the assumption that
> // a given tablet won't have thousands of "alter table" calls.
> // They are kept alive so that callers of schema() don't need to
> // worry about reference counting or locking.
> std::vector old_schemas_;
> The heap sampling then becomes
>  !image-2020-09-25-15-05-44-948.png! 
> So, to make application layer more flexible, it could be better to make the 
> size of the old_schemas configurable.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3134) Adjust default value for --raft_heartbeat_interval

2020-09-28 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17203499#comment-17203499
 ] 

Andrew Wong commented on KUDU-3134:
---

It's worth noting that an increased heartbeat interval has implications on 
scans. Safe time is currently updated on followers via heartbeats from the 
leader, and one of the first things we do in snapshot scans as followers is 
wait for safe time to be advanced past the snapshot timestamp. As such, if we 
set a high heartbeat interval, scans to followers may end up timing out waiting 
for the safetime to be bumped.

https://github.com/apache/kudu/blob/20fde59bca1f9df5a3cdee48f7794e0e8f16784a/src/kudu/tserver/tablet_service.cc#L3101

> Adjust default value for --raft_heartbeat_interval
> --
>
> Key: KUDU-3134
> URL: https://issues.apache.org/jira/browse/KUDU-3134
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.12.0
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
>
> Users often increase the `--raft_heartbeat_interval` on larger clusters or on 
> clusters with high replica counts. This helps avoid the servers flooding each 
> other with heartbeat RPCs causing queue overflows and using too much idle 
> CPU. Users have adjusted the values from 1.5 seconds to as high as 10s and we 
> have never seen people complain about problems after doing so.
> Anecdotally, I recently saw a cluster with 4k tablets per tablet server using 
> ~150% cpu usage while idle. By increasing the `--raft_heartbeat_interval` 
> from 500ms to 1500ms the cpu usage dropped to ~50%.
> Generally speaking users often care about Kudu stability and scalability over 
> an extremely short MTTR. Additionally our default client RPC timeouts of 30s 
> also seem to indicate slightly longer failover/retry times are tolerable in 
> the default case. 
> We should consider adjusting the default value of `--raft_heartbeat_interval` 
> to a higher value  to support larger and more efficient clusters by default. 
> Users who need a low MTTR can always adjust the value lower while also 
> adjusting other related timeouts. We may also want to consider adjusting the 
> default `--heartbeat_interval_ms` accordingly.
> Note: Batching the RPCs like mentioned in KUDU-1973 or providing a server to 
> server proxy for heartbeating may be a way to solve the issues without 
> adjusting the default configuration. However, adjusting the configuration is 
> easy and has proven effective in production deployments. Additionally 
> adjusting the defaults along with a KUDU-1973 like approach could lead to 
> even lower idle resource usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-3191) Fail tablet replicas that suffer from KUDU-2233 instead of crashing

2020-09-17 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong reassigned KUDU-3191:
-

Assignee: Andrew Wong

> Fail tablet replicas that suffer from KUDU-2233 instead of crashing
> ---
>
> Key: KUDU-3191
> URL: https://issues.apache.org/jira/browse/KUDU-3191
> Project: Kudu
>  Issue Type: Task
>  Components: compaction
>Reporter: Andrew Wong
>Assignee: Andrew Wong
>Priority: Major
>
> KUDU-2233 results in persisted corruption that causes a broken invariant, 
> leading to a server crash. The recovery process for this corruption is 
> arduous, especially if there are multiple tablet replicas in a given server 
> that suffer from it -- users typically start the server, see the crash, 
> remove the affected replica manually via tooling, and restart, repeatedly 
> until the server comes up healthily.
> Instead, we should consider treating this as we do CFile block-level 
> corruption[1] and fail the tablet replica. At best, we end up recovering from 
> a non-corrupted replica. At worst, we'd end up with multiple corrupted 
> replicas, which is still better than what we have today, which is multiple 
> corrupted replicas and unavailable servers that lead to excessive 
> re-replication.
> [1] 
> https://github.com/apache/kudu/commit/cf6927cb153f384afb649b664de1d4276bd6d83f



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3193) Per-tablet histogram for scan predicate efficiency

2020-09-11 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3193:
--
Description: 
Often times slow queries can be the result of a sub-optimal schema for a given 
workload, e.g. if a scan's predicate is not on a prefix of the primary key. 
Diagnosing such issues typically takes some understanding of the workloads that 
are being run against a given table. It'd be nice if there were something more 
quantitative to understand whether a table(t)'s schema is to blame for a slow 
scan.

One thought that comes to mind is maintaining a histogram metric per-tablet of 
the ratio between the number of rows returned during a given scan and the 
number of rows iterated through during that scan. A consistently low value of 
this metric would indicate that predicates applied to the given tablet are 
doing a lot of IO reading rows that are not in the results set.

  was:
Often times slow queries can be the result of a sub-optimal schema for a given 
workload, e.g. if a scan's predicate is not on a prefix of the primary key. 
Diagnosing such issues typically takes some understanding of the workloads that 
are being run against a given table. It'd be nice if there were something more 
quantitative to understand whether a table(t)'s schema is to blame for a slow 
scan.

One thought that comes to mind is maintaining a histogram metric per-tablet of 
the ratio between the number of rows returned during a given scan and the 
number of rows iterated through during that scan. A consistently low value of 
this metric would indicate that predicates applied to the given tablet are not 
very effective.


> Per-tablet histogram for scan predicate efficiency
> --
>
> Key: KUDU-3193
> URL: https://issues.apache.org/jira/browse/KUDU-3193
> Project: Kudu
>  Issue Type: Task
>  Components: metrics, ops-tooling, perf, tablet
>Reporter: Andrew Wong
>Priority: Major
>
> Often times slow queries can be the result of a sub-optimal schema for a 
> given workload, e.g. if a scan's predicate is not on a prefix of the primary 
> key. Diagnosing such issues typically takes some understanding of the 
> workloads that are being run against a given table. It'd be nice if there 
> were something more quantitative to understand whether a table(t)'s schema is 
> to blame for a slow scan.
> One thought that comes to mind is maintaining a histogram metric per-tablet 
> of the ratio between the number of rows returned during a given scan and the 
> number of rows iterated through during that scan. A consistently low value of 
> this metric would indicate that predicates applied to the given tablet are 
> doing a lot of IO reading rows that are not in the results set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3193) Per-tablet histogram for scan predicate efficiency

2020-09-11 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3193:
--
Description: 
Often times slow queries can be the result of a sub-optimal schema for a given 
workload, e.g. if a scan's predicate is not on a prefix of the primary key. 
Diagnosing such issues typically takes some understanding of the workloads that 
are being run against a given table. It'd be nice if there were something more 
quantitative to understand whether a table(t)'s schema is to blame for a slow 
scan.

One thought that comes to mind is maintaining a histogram metric per-tablet of 
the ratio between the number of rows returned during a given scan and the 
number of rows iterated through during that scan. A consistently low value of 
this metric would indicate that predicates applied to the given tablet are not 
very effective.

  was:
Often times slow queries can be the result of a sub-optimal schema for a given 
workload, e.g. if a scan's predicate is not on a prefix of the primary key. 
Diagnosing such issues typically takes some understanding of the workloads that 
are being run against a given table. It'd be nice if there were something more 
quantitative to understand whether a table(t)'s schema is to blame for a slow 
scan.

One thought that comes to mind is maintaining a histogram metric per-tablet of 
the ratio between the number of rows returned during a given and the number of 
rows iterated through during that scan. A consistently low value of this metric 
would indicate that predicates applied to the given tablet are not very 
effective.


> Per-tablet histogram for scan predicate efficiency
> --
>
> Key: KUDU-3193
> URL: https://issues.apache.org/jira/browse/KUDU-3193
> Project: Kudu
>  Issue Type: Task
>  Components: metrics, ops-tooling, perf, tablet
>Reporter: Andrew Wong
>Priority: Major
>
> Often times slow queries can be the result of a sub-optimal schema for a 
> given workload, e.g. if a scan's predicate is not on a prefix of the primary 
> key. Diagnosing such issues typically takes some understanding of the 
> workloads that are being run against a given table. It'd be nice if there 
> were something more quantitative to understand whether a table(t)'s schema is 
> to blame for a slow scan.
> One thought that comes to mind is maintaining a histogram metric per-tablet 
> of the ratio between the number of rows returned during a given scan and the 
> number of rows iterated through during that scan. A consistently low value of 
> this metric would indicate that predicates applied to the given tablet are 
> not very effective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3193) Per-tablet histogram for scan predicate efficiency

2020-09-11 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3193:
-

 Summary: Per-tablet histogram for scan predicate efficiency
 Key: KUDU-3193
 URL: https://issues.apache.org/jira/browse/KUDU-3193
 Project: Kudu
  Issue Type: Task
  Components: metrics, ops-tooling, perf, tablet
Reporter: Andrew Wong


Often times slow queries can be the result of a sub-optimal schema for a given 
workload, e.g. if a scan's predicate is not on a prefix of the primary key. 
Diagnosing such issues typically takes some understanding of the workloads that 
are being run against a given table. It'd be nice if there were something more 
quantitative to understand whether a table(t)'s schema is to blame for a slow 
scan.

One thought that comes to mind is maintaining a histogram metric per-tablet of 
the ratio between the number of rows returned during a given and the number of 
rows iterated through during that scan. A consistently low value of this metric 
would indicate that predicates applied to the given tablet are not very 
effective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3192) Leverage cluster ID when playing HMS notifications

2020-09-03 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3192:
-

 Summary: Leverage cluster ID when playing HMS notifications
 Key: KUDU-3192
 URL: https://issues.apache.org/jira/browse/KUDU-3192
 Project: Kudu
  Issue Type: Task
  Components: hms
Reporter: Andrew Wong


KUDU-2574 added a unique cluster ID to the master system catalog table. We 
should leverage this with the HMS integration by 1) synchronizing the cluster 
ID to the HMS, storing it as a part of the table JSON, and 2) filtering the HMS 
notifications received by the HMS log listener based on cluster ID.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KUDU-3191) Fail tablet replicas that suffer from KUDU-2233 instead of crashing

2020-09-03 Thread Andrew Wong (Jira)
Andrew Wong created KUDU-3191:
-

 Summary: Fail tablet replicas that suffer from KUDU-2233 instead 
of crashing
 Key: KUDU-3191
 URL: https://issues.apache.org/jira/browse/KUDU-3191
 Project: Kudu
  Issue Type: Task
  Components: compaction
Reporter: Andrew Wong


KUDU-2233 results in persisted corruption that causes a broken invariant, 
leading to a server crash. The recovery process for this corruption is arduous, 
especially if there are multiple tablet replicas in a given server that suffer 
from it -- users typically start the server, see the crash, remove the affected 
replica manually via tooling, and restart, repeatedly until the server comes up 
healthily.

Instead, we should consider treating this as we do CFile block-level 
corruption[1] and fail the tablet replica. At best, we end up recovering from a 
non-corrupted replica. At worst, we'd end up with multiple corrupted replicas, 
which is still better than what we have today, which is multiple corrupted 
replicas and unavailable servers that lead to excessive re-replication.

[1] 
https://github.com/apache/kudu/commit/cf6927cb153f384afb649b664de1d4276bd6d83f



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-08-25 Thread Andrew Wong (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong resolved KUDU-3119.
---
Fix Version/s: 1.12.0
   Resolution: Fixed

As far as I can tell (based on the logs and based on Cloudera's internal 
test-triaging history), the attached logs are all from Kudu 1.10, which doesn't 
have the fix for https://github.com/greg7mdp/sparsepp/issues/42. The version of 
sparsepp was bumped in 1.12 with 
[0fdfdc8|http://github.com/apache/kudu/commit/0fdfdc8].

> ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN
> ---
>
> Key: KUDU-3119
> URL: https://issues.apache.org/jira/browse/KUDU-3119
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, test
>Reporter: Alexey Serbin
>Priority: Blocker
> Fix For: 1.12.0
>
> Attachments: kudu-tool-test.20200709.txt.xz, kudu-tool-test.3.txt.xz, 
> kudu-tool-test.log.xz
>
>
> Sometimes the {{TestFsAddRemoveDataDirEndToEnd}} scenario of the {{ToolTest}} 
> reports races for TSAN builds:
> {noformat}
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:266:
>  Failure
> Failed
> Bad status: Runtime error: /tmp/dist-test-taskIZqSmU/build/tsan/bin/kudu: 
> process exited with non-ze
> ro status 66
> Google Test trace:
> /data0/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/kudu-tool-test.cc:265:
>  W0506 17:5
> 6:02.744191  4432 flags.cc:404] Enabled unsafe flag: --never_fsync=true
> I0506 17:56:02.780252  4432 fs_manager.cc:263] Metadata directory not provided
> I0506 17:56:02.780442  4432 fs_manager.cc:269] Using write-ahead log 
> directory (fs_wal_dir) as metad
> ata directory
> I0506 17:56:02.789638  4432 fs_manager.cc:399] Time spent opening directory 
> manager: real 0.007s
> user 0.005s sys 0.002s
> I0506 17:56:02.789986  4432 env_posix.cc:1676] Not raising this process' open 
> files per process limi
> t of 1048576; it is already as high as it can go
> I0506 17:56:02.790426  4432 file_cache.cc:465] Constructed file cache lbm 
> with capacity 419430
> ==
> WARNING: ThreadSanitizer: data race (pid=4432)
> ...
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3119) ToolTest.TestFsAddRemoveDataDirEndToEnd reports race under TSAN

2020-08-11 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176072#comment-17176072
 ] 

Andrew Wong commented on KUDU-3119:
---

The race isn't quite where I expected, per the following lines in the logs:

{code:java}
  Write of size 1 at 0x7f82f790a760 by thread T5 (mutexes: write M1638):
#0 spp::sparsegroup<>::_sizing(unsigned int) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1103:56
 (libkudu_fs.so+0x102d70)
#1 void 
spp::sparsegroup<>::_set_aux<>(kudu::MemTrackerAllocator, 
std::__1::allocator > >&, unsigned char, std::__1::pair<>&, 
spp::integral_constant) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31
 (libkudu_fs.so+0x102ac8)
#2 void 
spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, 
std::__1::allocator > >&, unsigned char, unsigned char, 
std::__1::pair<>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13
 (libkudu_fs.so+0x102a56)
#3 std::__1::pair<>* spp::sparsegroup<>::set 
>(kudu::MemTrackerAllocator >, 
std::__1::allocator > > >&, unsigned char, 
std::__1::pair 
>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9
 (libkudu_fs.so+0x10295f)
#4 std::__1::pair<>& spp::sparsetable<>::set >(unsigned 
long, std::__1::pair<>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25
 (libkudu_fs.so+0x1036ba)
#5 std::__1::pair<>& 
spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22
 (libkudu_fs.so+0x101910)
#6 std::__1::pair<>& 
spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, 
kudu::BlockIdEqual, kudu::MemTrackerAllocator >, 
std::__1::allocator > > > 
>::DefaultValue>(kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28
 (libkudu_fs.so+0x1014a1)
#7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29
 (libkudu_fs.so+0xeece0)
#8 
kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr)
 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32
 (libkudu_fs.so+0xe6a27)
...

  Previous read of size 1 at 0x7f82f790a760 by thread T6 (mutexes: write M1637):
#0 spp::sparsegroup<>::_sizing(unsigned int) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1088:14
 (libkudu_fs.so+0x102d1c)
#1 void spp::sparsegroup<>::_set_aux > 
>(kudu::MemTrackerAllocator >, 
std::__1::allocator > > >&, unsigned char, 
std::__1::pair 
>&, spp::integral_constant) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1392:31
 (libkudu_fs.so+0x102ac8)
#2 void 
spp::sparsegroup<>::_set<>(kudu::MemTrackerAllocator, 
std::__1::allocator<> >&, unsigned char, unsigned char, std::__1::pair<>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1426:13
 (libkudu_fs.so+0x102a56)
#3 std::__1::pair<>* spp::sparsegroup<>::set > 
>(kudu::MemTrackerAllocator >, 
std::__1::allocator > > >&, unsigned char, 
std::__1::pair 
>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:1444:9
 (libkudu_fs.so+0x10295f)
#4 std::__1::pair<>& spp::sparsetable<>::set > >(unsigned long, 
std::__1::pair 
>&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:2236:25
 (libkudu_fs.so+0x1036ba)
#5 std::__1::pair<>& 
spp::sparse_hashtable<>::_insert_at > >(std::__1::pair >&, unsigned long, bool) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3173:22
 (libkudu_fs.so+0x101910)
#6 std::__1::pair<>& 
spp::sparse_hashtable<>::find_or_insert, kudu::BlockIdHash, 
kudu::BlockIdEqual, kudu::MemTrackerAllocator >, 
std::__1::allocator > > > 
>::DefaultValue>(kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3282:28
 (libkudu_fs.so+0x1014a1)
#7 spp::sparse_hash_map<>::operator[](kudu::BlockId const&) 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/thirdparty/installed/common/include/sparsepp/spp.h:3792:29
 (libkudu_fs.so+0xeece0)
#8 
kudu::fs::LogBlockManager::AddLogBlock(scoped_refptr)
 
/data/jenkins/workspace/kudu-pre-commit-unittest-TSAN/src/kudu/fs/log_block_manager.cc:2262:32
 (l

[jira] [Commented] (KUDU-3176) Backup & restore incompatibility

2020-08-11 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175771#comment-17175771
 ] 

Andrew Wong commented on KUDU-3176:
---

What was the error seen here? Do you have application logs for the restore job? 
Or at least can you point to what the issue is?

> Backup & restore incompatibility
> 
>
> Key: KUDU-3176
> URL: https://issues.apache.org/jira/browse/KUDU-3176
> Project: Kudu
>  Issue Type: Bug
>Reporter: Attila Bukor
>Assignee: Attila Bukor
>Priority: Critical
>
> The ownership in the backup metadata introduced in KUDU-3090 seems to have 
> backward/forward compatibility issues as restoring a backup with 
> post-ownership backup tool that was created on a pre-ownership cluster with 
> the matching backup tool fails. Other combinations might also fail but I 
> haven't reproduced them so far.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory

2020-08-10 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17175080#comment-17175080
 ] 

Andrew Wong commented on KUDU-3180:
---

{quote}After we tuned -flush_threshold_secs to 1800(was 3600 before), we could 
avoid OOM
{quote}
If the server is running low on memory during these times, wouldn't that put it 
into memory-pressure mode anyway? If you're on 1.11, that should already 
schedule a flush for the mem-store that anchors the most memory. And in 1.12, 
based on the screenshots, we would still schedule some flushes for some fairly 
large mem-stores. Additionally, I would have expected write requests to also be 
throttled, further slowing down the memory growth. If there is no 
memory-pressure despite there being OOMs, I wonder if this could be related to 
KUDU-3030.
{quote}Maybe could use max(memory_size, time_since_last_flush to define perf 
improvement of a mem-store flush, so that both big mem-stores and long_lived 
mem-stores could be flushed in priority.
{quote}
Yeah, my biggest concern is that we don't regress KUDU-3002, since the perf 
score for {{time_since_last_flush}} is limited. If we go down this route, that 
may need to be adjusted.

> kudu don't always prefer to flush MRS/DMS that anchor more memory
> -
>
> Key: KUDU-3180
> URL: https://issues.apache.org/jira/browse/KUDU-3180
> Project: Kudu
>  Issue Type: Improvement
>Reporter: YifanZhang
>Priority: Major
> Attachments: image-2020-08-04-20-26-53-749.png, 
> image-2020-08-04-20-28-00-665.png
>
>
> Current time-based flush policy always give a flush op a high score if we 
> haven't flushed for the tablet in a long time, that may lead to starvation of 
> ops that could free more memory.
> We set  -flush_threshold_mb=32,  -flush_threshold_secs=1800 in a cluster, and 
> find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS 
> flushes and compactions, which seems not so reasonable.
> !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory

2020-08-07 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172948#comment-17172948
 ] 

Andrew Wong edited comment on KUDU-3180 at 8/7/20, 8:07 AM:


Looking through the code a bit to explain the 0B logs retained, it seems like 
logs retained only accounts for the size of ReadableLogSegments, meaning if a 
WAL segment is still being written to, it will not be accounted for in the 
space retained estimate. See GetReplaySizeMap() in consensus/log.h for more 
details.

{quote}It's not always true that older or larger mem-stores anchor more WAL 
bytes as far as I saw on /maintenance-manager page, so maybe we shouldn't 
always use WAL bytes anchored to determine what to flush.{quote}

That's true, but WAL bytes anchored will be somewhat correlated with both the 
size and the age, not taking into account the above replay size map discrepancy.

One question about your particular use case though: would tuning the 
{{--memory_pressure_percentage}} gflag help at all? If you reduce it 
significantly, you would guarantee MRS/DMS flushing would be prioritized over 
compactions. Admittedly, it will use the WAL bytes anchored to prioritize ops, 
but that should still work out to flush larger mem-stores in insert-mostly 
workloads.


was (Author: andrew.wong):
Looking through the code a bit to explain the 0B logs retained, it seems like 
logs retained only accounts for the size of ReadableLogSegments, meaning if a 
WAL segment is still being written to, it will be accounted for in the space 
retained estimate. See GetReplaySizeMap() in consensus/log.h for more details.

{quote}It's not always true that older or larger mem-stores anchor more WAL 
bytes as far as I saw on /maintenance-manager page, so maybe we shouldn't 
always use WAL bytes anchored to determine what to flush.{quote}

That's true, but WAL bytes anchored will be somewhat correlated with both the 
size and the age, not taking into account the above replay size map discrepancy.

One question about your particular use case though: would tuning the 
{{--memory_pressure_percentage}} gflag help at all? If you reduce it 
significantly, you would guarantee MRS/DMS flushing would be prioritized over 
compactions. Admittedly, it will use the WAL bytes anchored to prioritize ops, 
but that should still work out to flush larger mem-stores in insert-mostly 
workloads.

> kudu don't always prefer to flush MRS/DMS that anchor more memory
> -
>
> Key: KUDU-3180
> URL: https://issues.apache.org/jira/browse/KUDU-3180
> Project: Kudu
>  Issue Type: Improvement
>Reporter: YifanZhang
>Priority: Major
> Attachments: image-2020-08-04-20-26-53-749.png, 
> image-2020-08-04-20-28-00-665.png
>
>
> Current time-based flush policy always give a flush op a high score if we 
> haven't flushed for the tablet in a long time, that may lead to starvation of 
> ops that could free more memory.
> We set  -flush_threshold_mb=32,  -flush_threshold_secs=1800 in a cluster, and 
> find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS 
> flushes and compactions, which seems not so reasonable.
> !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory

2020-08-07 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172948#comment-17172948
 ] 

Andrew Wong commented on KUDU-3180:
---

Looking through the code a bit, it seems like logs retained only accounts for 
the size of ReadableLogSegments, meaning if a WAL segment is still being 
written to, it will be accounted for in the space retained estimate. See 
GetReplaySizeMap() in consensus/log.h for more details.

{quote}It's not always true that older or larger mem-stores anchor more WAL 
bytes as far as I saw on /maintenance-manager page, so maybe we shouldn't 
always use WAL bytes anchored to determine what to flush.{quote}

That's true, but WAL bytes anchored will be somewhat correlated with both the 
size and the age, not taking into account the above replay size map discrepancy.

One question about your particular use case though: would tuning the 
{{--memory_pressure_percentage}} gflag help at all? If you reduce it 
significantly, you would guarantee MRS/DMS flushing would be prioritized over 
compactions. Admittedly, it will use the WAL bytes anchored to prioritize ops, 
but that should still work out to flush larger mem-stores in insert-mostly 
workloads.

> kudu don't always prefer to flush MRS/DMS that anchor more memory
> -
>
> Key: KUDU-3180
> URL: https://issues.apache.org/jira/browse/KUDU-3180
> Project: Kudu
>  Issue Type: Improvement
>Reporter: YifanZhang
>Priority: Major
> Attachments: image-2020-08-04-20-26-53-749.png, 
> image-2020-08-04-20-28-00-665.png
>
>
> Current time-based flush policy always give a flush op a high score if we 
> haven't flushed for the tablet in a long time, that may lead to starvation of 
> ops that could free more memory.
> We set  -flush_threshold_mb=32,  -flush_threshold_secs=1800 in a cluster, and 
> find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS 
> flushes and compactions, which seems not so reasonable.
> !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory

2020-08-07 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172948#comment-17172948
 ] 

Andrew Wong edited comment on KUDU-3180 at 8/7/20, 7:22 AM:


Looking through the code a bit to explain the 0B logs retained, it seems like 
logs retained only accounts for the size of ReadableLogSegments, meaning if a 
WAL segment is still being written to, it will be accounted for in the space 
retained estimate. See GetReplaySizeMap() in consensus/log.h for more details.

{quote}It's not always true that older or larger mem-stores anchor more WAL 
bytes as far as I saw on /maintenance-manager page, so maybe we shouldn't 
always use WAL bytes anchored to determine what to flush.{quote}

That's true, but WAL bytes anchored will be somewhat correlated with both the 
size and the age, not taking into account the above replay size map discrepancy.

One question about your particular use case though: would tuning the 
{{--memory_pressure_percentage}} gflag help at all? If you reduce it 
significantly, you would guarantee MRS/DMS flushing would be prioritized over 
compactions. Admittedly, it will use the WAL bytes anchored to prioritize ops, 
but that should still work out to flush larger mem-stores in insert-mostly 
workloads.


was (Author: andrew.wong):
Looking through the code a bit, it seems like logs retained only accounts for 
the size of ReadableLogSegments, meaning if a WAL segment is still being 
written to, it will be accounted for in the space retained estimate. See 
GetReplaySizeMap() in consensus/log.h for more details.

{quote}It's not always true that older or larger mem-stores anchor more WAL 
bytes as far as I saw on /maintenance-manager page, so maybe we shouldn't 
always use WAL bytes anchored to determine what to flush.{quote}

That's true, but WAL bytes anchored will be somewhat correlated with both the 
size and the age, not taking into account the above replay size map discrepancy.

One question about your particular use case though: would tuning the 
{{--memory_pressure_percentage}} gflag help at all? If you reduce it 
significantly, you would guarantee MRS/DMS flushing would be prioritized over 
compactions. Admittedly, it will use the WAL bytes anchored to prioritize ops, 
but that should still work out to flush larger mem-stores in insert-mostly 
workloads.

> kudu don't always prefer to flush MRS/DMS that anchor more memory
> -
>
> Key: KUDU-3180
> URL: https://issues.apache.org/jira/browse/KUDU-3180
> Project: Kudu
>  Issue Type: Improvement
>Reporter: YifanZhang
>Priority: Major
> Attachments: image-2020-08-04-20-26-53-749.png, 
> image-2020-08-04-20-28-00-665.png
>
>
> Current time-based flush policy always give a flush op a high score if we 
> haven't flushed for the tablet in a long time, that may lead to starvation of 
> ops that could free more memory.
> We set  -flush_threshold_mb=32,  -flush_threshold_secs=1800 in a cluster, and 
> find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS 
> flushes and compactions, which seems not so reasonable.
> !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KUDU-3180) kudu don't always prefer to flush MRS/DMS that anchor more memory

2020-08-05 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171714#comment-17171714
 ] 

Andrew Wong edited comment on KUDU-3180 at 8/5/20, 7:36 PM:


I've been discussing with [~aserbin] and [~granthenke] about this problem, and 
one thing that stands out about the issue here is that it isn't obvious what 
quantifiable values we should optimize for here. I think there are a few things 
to care about:
 * Insert/update performance
 * Memory used by mem-stores
 * Space anchored by WALs
 * To some extent, write amplification and size of output disk-stores

These values don't explicitly trade off with one another, which makes it a bit 
difficult to determine the correct heuristic for when to flush mem-stores. Some 
different solutions we've been discussing are:
 * Defining some cost function based on the time since last flush AND memory 
used. This might be an improvement over today's policy, which uses a simple 
branching heuristic to pick based on time since last flush OR memory used.
 * Always using the WAL bytes anchored to determine what to flush. This has the 
benefit of somewhat taking into account both the time since last flush and 
memory used, in the sense that older mem-stores will tend to anchor more WAL 
bytes, and larger mem-stores will also tend to anchor more WAL bytes. This has 
the added benefit of keeping the "space anchored by WALs" value in mind, so we 
don't end up with something like KUDU-3002.
 * Update the policy based on the current amount of space used / memory used to 
pick the "right" values to trade off. E.g. if we are running low on WAL disk 
space, prioritize based on WAL bytes anchored; if we are running low on memory, 
prioritize based on memory used, etc.

Before exploring the solution space further, it'd be better to more clearly 
define the problem at hand. [~zhangyifan27] what are the values that look off 
to you? What tradeoffs would you prefer to make in filing this jira? Would 
something as simple as lowering {{-flush_threshold_mb}} or increasing 
{{-flush_threshold_secs}} help you?


was (Author: andrew.wong):
I've been discussing with [~aserbin] and [~granthenke] about this problem, and 
one thing that stands out about the issue here is that it isn't obvious what 
quantifiable values we should optimize for here. I think there are a few things 
to care about:
 * Insert/update performance
 * Memory used by mem-stores
 * Space anchored by WALs
 * To some extent, write amplification and size of output disk-stores

These values don't explicitly trade off with one another, which makes it a bit 
difficult to determine the correct heuristic for when to flush mem-stores. Some 
different solutions we've been discussing are:
* Defining some cost function based on the time since last flush AND memory 
used. This might be an improvement over today's policy, which uses a simple 
branching heuristic to pick based on time since last flush OR memory used.
* Always using the WAL bytes anchored to determine what to flush. This has the 
benefit of somewhat taking into account both the time since last flush and 
memory used, in the sense that older mem-stores will tend to anchor more WAL 
bytes, and larger mem-stores will also tend to anchor more WAL bytes. This has 
the added benefit of keeping the "space anchored by WALs" value in mind, so we 
don't end up with something like KUDU-3002.
* Update the policy based on the current amount of space used / memory used to 
pick the "right" values to trade off. E.g. if we are running low on WAL disk 
space, prioritize based on WAL bytes anchored; if we are running low on memory, 
prioritize based on memory used, etc.

Before exploring the solution space further, it'd be better to more clearly 
define the problem at hand. [~zhangyifan27] what are the values that look off 
to you? What tradeoffs would you prefer to make in filing this jira? Would 
something as simple as lowering {{-flush_threshold_mb}} or increasing 
{{-flush_threshold_secs}} help you?

> kudu don't always prefer to flush MRS/DMS that anchor more memory
> -
>
> Key: KUDU-3180
> URL: https://issues.apache.org/jira/browse/KUDU-3180
> Project: Kudu
>  Issue Type: Bug
>Reporter: YifanZhang
>Priority: Major
> Attachments: image-2020-08-04-20-26-53-749.png, 
> image-2020-08-04-20-28-00-665.png
>
>
> Current time-based flush policy always give a flush op a high score if we 
> haven't flushed for the tablet in a long time, that may lead to starvation of 
> ops that could free more memory.
> We set  -flush_threshold_mb=32,  -flush_threshold_secs=1800 in a cluster, and 
> find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS 
> flushes and compactions, which seems not so reasona

  1   2   3   4   5   6   >