[jira] [Resolved] (KUDU-639) Leader doesn't overwrite demoted follower's log properly
[ https://issues.apache.org/jira/browse/KUDU-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy resolved KUDU-639. - Resolution: Fixed > Leader doesn't overwrite demoted follower's log properly > > > Key: KUDU-639 > URL: https://issues.apache.org/jira/browse/KUDU-639 > Project: Kudu > Issue Type: Bug > Components: consensus >Affects Versions: M4.5 >Reporter: David Alves >Assignee: Todd Lipcon >Priority: Minor > Fix For: M5 > > > We just ran into this situation in the YCSB cluster, which is apparently a > log divergence. > We have nodes a, b, c (corresponding to nodes > 33c8fb1dc4434df0938ccc27ecfd58a1/a1219, > 4ed2e09f80e04d198edeb53e15b3539e/a1220, > ab8ed89f9041495a95b8d2b77591c9d7/a1215). > Node a is leader for term 3, timesout > Node b is elected leader for term 5 with votes from b, c > When b is elected leader the log state is: > State: All replicated op: 3.6546, Majority replicated op: 3.6533, Committed > index: 3.6533, Last appended: 3.6546, Current term: 5 > b never actually replicates anything and eventually loses leadership to node > a, again. > When b loses leadership it's wall is at the following state: > State: All replicated op: 0.0, Majority replicated op: 3.6533, Committed > index: 3.6533, Last appended: 5.6547, Current term: 5 > That is b appended a message in term 5 but never actually got to commit it. > However, if we look at b's log we find a message in term 5 committed: > 3.6546@99404 REPLICATE WRITE_OP > COMMIT 3.6533 > 5.6547@99789 REPLICATE CHANGE_CONFIG_OP > COMMIT 3.6535 > COMMIT 3.6536 > COMMIT 3.6537 > COMMIT 3.6538 > COMMIT 3.6534 > COMMIT 3.6541 > COMMIT 3.6540 > COMMIT 3.6543 > COMMIT 3.6542 > COMMIT 3.6545 > COMMIT 3.6546 > COMMIT 3.6544 > COMMIT 3.6539 > COMMIT 5.6547 > 3.6548@99430 REPLICATE WRITE_OP > 6.6549@99795 REPLICATE CHANGE_CONFIG_OP > And more problematically, that diverges from the other two nodes's logs: > 3.6546@99404 REPLICATE WRITE_OP > COMMIT 3.6533 > COMMIT 3.6536 > COMMIT 3.6537 > COMMIT 3.6535 > COMMIT 3.6539 > COMMIT 3.6538 > COMMIT 3.6534 > COMMIT 3.6541 > COMMIT 3.6540 > COMMIT 3.6543 > COMMIT 3.6542 > COMMIT 3.6544 > 3.6547@99429 REPLICATE WRITE_OP > 3.6548@99430 REPLICATE WRITE_OP > 6.6549@99795 REPLICATE CHANGE_CONFIG_OP > 6.6550@99878 REPLICATE WRITE_OP > 6.6551@99879 REPLICATE WRITE_OP > 6.6552@99880 REPLICATE WRITE_OP > COMMIT 3.6545 > COMMIT 3.6548 > COMMIT 3.6547 > COMMIT 3.6546 > COMMIT 6.6549 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-639) Leader doesn't overwrite demoted follower's log properly
[ https://issues.apache.org/jira/browse/KUDU-639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-639: Fix Version/s: M5 This was fixed in 2015. Please file a separate Jira to track the task if it seems likely someone will add a test for this > Leader doesn't overwrite demoted follower's log properly > > > Key: KUDU-639 > URL: https://issues.apache.org/jira/browse/KUDU-639 > Project: Kudu > Issue Type: Bug > Components: consensus >Affects Versions: M4.5 >Reporter: David Alves >Assignee: Todd Lipcon >Priority: Minor > Fix For: M5 > > > We just ran into this situation in the YCSB cluster, which is apparently a > log divergence. > We have nodes a, b, c (corresponding to nodes > 33c8fb1dc4434df0938ccc27ecfd58a1/a1219, > 4ed2e09f80e04d198edeb53e15b3539e/a1220, > ab8ed89f9041495a95b8d2b77591c9d7/a1215). > Node a is leader for term 3, timesout > Node b is elected leader for term 5 with votes from b, c > When b is elected leader the log state is: > State: All replicated op: 3.6546, Majority replicated op: 3.6533, Committed > index: 3.6533, Last appended: 3.6546, Current term: 5 > b never actually replicates anything and eventually loses leadership to node > a, again. > When b loses leadership it's wall is at the following state: > State: All replicated op: 0.0, Majority replicated op: 3.6533, Committed > index: 3.6533, Last appended: 5.6547, Current term: 5 > That is b appended a message in term 5 but never actually got to commit it. > However, if we look at b's log we find a message in term 5 committed: > 3.6546@99404 REPLICATE WRITE_OP > COMMIT 3.6533 > 5.6547@99789 REPLICATE CHANGE_CONFIG_OP > COMMIT 3.6535 > COMMIT 3.6536 > COMMIT 3.6537 > COMMIT 3.6538 > COMMIT 3.6534 > COMMIT 3.6541 > COMMIT 3.6540 > COMMIT 3.6543 > COMMIT 3.6542 > COMMIT 3.6545 > COMMIT 3.6546 > COMMIT 3.6544 > COMMIT 3.6539 > COMMIT 5.6547 > 3.6548@99430 REPLICATE WRITE_OP > 6.6549@99795 REPLICATE CHANGE_CONFIG_OP > And more problematically, that diverges from the other two nodes's logs: > 3.6546@99404 REPLICATE WRITE_OP > COMMIT 3.6533 > COMMIT 3.6536 > COMMIT 3.6537 > COMMIT 3.6535 > COMMIT 3.6539 > COMMIT 3.6538 > COMMIT 3.6534 > COMMIT 3.6541 > COMMIT 3.6540 > COMMIT 3.6543 > COMMIT 3.6542 > COMMIT 3.6544 > 3.6547@99429 REPLICATE WRITE_OP > 3.6548@99430 REPLICATE WRITE_OP > 6.6549@99795 REPLICATE CHANGE_CONFIG_OP > 6.6550@99878 REPLICATE WRITE_OP > 6.6551@99879 REPLICATE WRITE_OP > 6.6552@99880 REPLICATE WRITE_OP > COMMIT 3.6545 > COMMIT 3.6548 > COMMIT 3.6547 > COMMIT 3.6546 > COMMIT 6.6549 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KUDU-2870) Checksum scan fails with "Not authorized" error when authz enabled
Mike Percy created KUDU-2870: Summary: Checksum scan fails with "Not authorized" error when authz enabled Key: KUDU-2870 URL: https://issues.apache.org/jira/browse/KUDU-2870 Project: Kudu Issue Type: Bug Components: authz Affects Versions: 1.10.0 Reporter: Mike Percy While testing a Kudu 1.10.0 RC build with authorization enabled, I tried a checksum scan and it failed: {code:java} [mpercy@mpercy-c63s-0619-1 ~]$ kudu cluster ksck mpercy-c63s-0619-1.vpc.cloudera.com -tables=default.loadgen_auto_b527de07b2d842f3a3c82c5f85eb2854 -checksum_scan -sections=CHECKSUM_RESULTS Checksum finished in 0s: 0/8 replicas remaining (0B from disk, 0 rows summed) Checksum Summary --- default.loadgen_auto_b527de07b2d842f3a3c82c5f85eb2854 --- T 09d0df0ca48c41bf94c6a3a03533b811 P 7d31913f6bbf4355a974c76e4f82c72a (mpercy-c63s-0619-4.vpc.cloudera.com:7050): Error: Remote error: Not authorized: no authorization token presented T 16cafc1e2e814b5fb988b22554ac306b P 11edb01b3b184a2da8586fa5cffda90c (mpercy-c63s-0619-3.vpc.cloudera.com:7050): Error: Remote error: Not authorized: no authorization token presented T 37d40c90b0614b5d9515d1458e31657c P de47be31840f4b349f970cf759097cec (mpercy-c63s-0619-2.vpc.cloudera.com:7050): Error: Remote error: Not authorized: no authorization token presented T 5da264a95d31474ea4b0b2e464a5b261 P de47be31840f4b349f970cf759097cec (mpercy-c63s-0619-2.vpc.cloudera.com:7050): Error: Remote error: Not authorized: no authorization token presented T 6acad06c945942b5af696f7f59b4d2ea P 7d31913f6bbf4355a974c76e4f82c72a (mpercy-c63s-0619-4.vpc.cloudera.com:7050): Error: Remote error: Not authorized: no authorization token presented T 949f20e82db1467fa9f968853c901f11 P 11edb01b3b184a2da8586fa5cffda90c (mpercy-c63s-0619-3.vpc.cloudera.com:7050): Error: Remote error: Not authorized: no authorization token presented T 9eda0d8c267c44efb9d76cf8fb911f93 P 921f4e7e28274a9189c978162d604f2e (mpercy-c63s-0619-5.vpc.cloudera.com:7050): Error: Remote error: Not authorized: no authorization token presented T a1ec4565447c4628baa6a1c5f9765c7a P 11edb01b3b184a2da8586fa5cffda90c (mpercy-c63s-0619-3.vpc.cloudera.com:7050): Error: Remote error: Not authorized: no authorization token presented == Warnings: == Some masters have unsafe, experimental, or hidden flags set Some tablet servers have unsafe, experimental, or hidden flags set == Errors: == Aborted: checksum scan error: 8 errors were detected FAILED Runtime error: ksck discovered errors{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KUDU-1575) Backup and restore procedures
[ https://issues.apache.org/jira/browse/KUDU-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy resolved KUDU-1575. -- Resolution: Fixed Fix Version/s: 1.10.0 Incremental backup / restore made it into 1.10.0. This still needs to be documented. > Backup and restore procedures > - > > Key: KUDU-1575 > URL: https://issues.apache.org/jira/browse/KUDU-1575 > Project: Kudu > Issue Type: Improvement > Components: master, tserver >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > Labels: backup > Fix For: 1.10.0 > > > Kudu needs backup and restore procedures, both for data and for metadata. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2832) Clean up after a failed restore job
[ https://issues.apache.org/jira/browse/KUDU-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2832: - Component/s: backup > Clean up after a failed restore job > --- > > Key: KUDU-2832 > URL: https://issues.apache.org/jira/browse/KUDU-2832 > Project: Kudu > Issue Type: Improvement > Components: backup >Reporter: Will Berkeley >Priority: Major > > If a restore job fails, it may leave a partially-restored table on the > destination cluster. This will prevent a naive retry from succeeding. We > should make more effort to clean up if a restore job fails, so that a simple > retry of the same job might be able to succeed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2827) Backup should tombstone dropped tables
[ https://issues.apache.org/jira/browse/KUDU-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16850236#comment-16850236 ] Mike Percy commented on KUDU-2827: -- To add a little more context to this Jira, the implication is that we should have a way to determine whether a table was dropped or renamed, which would likely require additional master RPC API support, since we would need to be able to take a look at the current state of a table id. Table ids are used in the backup graph. The purpose is to properly handle dropped tables in the backup GC (backup cleanup) tool now merged as part of [https://github.com/apache/kudu/commit/a5a8da655ca8f0088dcd39301bd9bd87e182c460] > Backup should tombstone dropped tables > -- > > Key: KUDU-2827 > URL: https://issues.apache.org/jira/browse/KUDU-2827 > Project: Kudu > Issue Type: Task > Components: backup >Reporter: Mike Percy >Priority: Major > > It would be useful for backup to "tombstone" dropped tables so that the GC > process can detect this and eventually consider these eligible for deletion, > even though they are still on the restore path from a backup graph > perspective. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2827) Backup should tombstone dropped tables
Mike Percy created KUDU-2827: Summary: Backup should tombstone dropped tables Key: KUDU-2827 URL: https://issues.apache.org/jira/browse/KUDU-2827 Project: Kudu Issue Type: Task Components: backup Reporter: Mike Percy It would be useful for backup to "tombstone" dropped tables so that the GC process can detect this and eventually consider these eligible for deletion, even though they are still on the restore path from a backup graph perspective. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2810) Restore needs DELETE_IGNORE
[ https://issues.apache.org/jira/browse/KUDU-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832126#comment-16832126 ] Mike Percy commented on KUDU-2810: -- Another option – more of a workaround – would be to simply handle the Not Found error specifically in the Restore job. > Restore needs DELETE_IGNORE > --- > > Key: KUDU-2810 > URL: https://issues.apache.org/jira/browse/KUDU-2810 > Project: Kudu > Issue Type: Bug > Components: backup >Affects Versions: 1.9.0 >Reporter: Will Berkeley >Priority: Major > Fix For: 1.10.0 > > > If a restore task fails for any reason, and it's restoring an incremental > with DELETE row actions, when the task is retried it will fail any deletes > that happened on the previous task run. We need a DELETE_IGNORE write > operation to handle this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2809) Incremental backup / diff scan does not handle rows that are inserted and deleted between two incrementals correctly
[ https://issues.apache.org/jira/browse/KUDU-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832108#comment-16832108 ] Mike Percy commented on KUDU-2809: -- +1 on the correct solution here being that diff scan should not return the deleted row at all if the insert of the row was the first operation that happened after the start timestamp of the diff scan and the end state was deleted. > Incremental backup / diff scan does not handle rows that are inserted and > deleted between two incrementals correctly > > > Key: KUDU-2809 > URL: https://issues.apache.org/jira/browse/KUDU-2809 > Project: Kudu > Issue Type: Bug > Components: backup >Affects Versions: 1.9.0 >Reporter: Will Berkeley >Priority: Major > > I did the following sequence of operations: > # Insert 100 million rows > # Update 1 out of every 11 rows > # Make a full backup > # Insert 100 million more rows, after the original rows in keyspace > # Delete 1 out of every 23 rows > # Make an incremental backup > Restore failed to apply the incremental backup, failing with an error like > {noformat} > java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; > sample errors: > {noformat} > Due to another bug, there's no sample errors, but after hacking around that > bug, I found that the incremental contained a row with a DELETE action for a > key that is not present in the full backup. That's because the row was > inserted in step 4 and deleted in step 5, between backups. > We could fix this by > # Making diff scan not return a DELETE for such a row > # Implementing and using DELETE IGNORE in the restore job -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2801) Support exact-match timestamp for restore
Mike Percy created KUDU-2801: Summary: Support exact-match timestamp for restore Key: KUDU-2801 URL: https://issues.apache.org/jira/browse/KUDU-2801 Project: Kudu Issue Type: Task Components: backup Reporter: Mike Percy If a user wants to restore a backup at a specific timestamp, we should allow for a flag to pass an exact-match timestamp instead of just an upper-bound timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2797) Implement table size metrics
Mike Percy created KUDU-2797: Summary: Implement table size metrics Key: KUDU-2797 URL: https://issues.apache.org/jira/browse/KUDU-2797 Project: Kudu Issue Type: Task Components: master, metrics Affects Versions: 1.8.0 Reporter: Mike Percy It would be valuable to implement table size metrics for row count and byte size (pre-replication and post-replication). The master could aggregate these stats from the various partitions (tablets) and expose aggregated metrics for consumption by monitoring systems and dashboards. These same metrics would also be valuable to display on the web UI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2692) Remove requirements for virtual columns to specify a read default and not be nullable
[ https://issues.apache.org/jira/browse/KUDU-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824423#comment-16824423 ] Mike Percy commented on KUDU-2692: -- This is low priority because all of the diff scan / incremental backup APIs are currently marked private; if we decided to make diff scan public this might be more important for usability. > Remove requirements for virtual columns to specify a read default and not be > nullable > - > > Key: KUDU-2692 > URL: https://issues.apache.org/jira/browse/KUDU-2692 > Project: Kudu > Issue Type: Improvement > Components: tablet >Reporter: Mike Percy >Priority: Minor > Labels: backup > > Virtual column types such as IS_DELETED currently require a read default to > be specified, in addition to not being allowed to be nullable. Consider > relaxing these requirements to improve the user experience when working with > virtual columns. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KUDU-2678) [Backup] Ensure the restore job can load the data in order
[ https://issues.apache.org/jira/browse/KUDU-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy resolved KUDU-2678. -- Resolution: Won't Do Fix Version/s: n/a For now, we'll close this out, we can reopen if we suspect things have changed based on flush / compaction performance > [Backup] Ensure the restore job can load the data in order > -- > > Key: KUDU-2678 > URL: https://issues.apache.org/jira/browse/KUDU-2678 > Project: Kudu > Issue Type: Improvement >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Minor > Labels: backup > Fix For: n/a > > > We need to adjust the Spark backup and restore jobs to be sure that we are > loading the data in sorted order. Not only is this useful for performance > today, but we may want to support some server side performance optimizations > in the future that depend on this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2678) [Backup] Ensure the restore job can load the data in order
[ https://issues.apache.org/jira/browse/KUDU-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824418#comment-16824418 ] Mike Percy commented on KUDU-2678: -- Based on the results of scale testing by Will, this doesn't help performance overall. > [Backup] Ensure the restore job can load the data in order > -- > > Key: KUDU-2678 > URL: https://issues.apache.org/jira/browse/KUDU-2678 > Project: Kudu > Issue Type: Improvement >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Minor > Labels: backup > > We need to adjust the Spark backup and restore jobs to be sure that we are > loading the data in sorted order. Not only is this useful for performance > today, but we may want to support some server side performance optimizations > in the future that depend on this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2670) Splitting more tasks for spark job, and add more concurrent for scan operation
[ https://issues.apache.org/jira/browse/KUDU-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2670: - Labels: performance (was: backup performance) > Splitting more tasks for spark job, and add more concurrent for scan operation > -- > > Key: KUDU-2670 > URL: https://issues.apache.org/jira/browse/KUDU-2670 > Project: Kudu > Issue Type: Improvement > Components: java, spark >Affects Versions: 1.8.0 >Reporter: yangz >Priority: Major > Labels: performance > > Refer to the KUDU-2437 Split a tablet into primary key ranges by size. > We need a java client implementation to support the split the tablet scan > operation. > We suggest two new implementation for the java client. > # A ConcurrentKuduScanner to get more scanner read data at the same time. > This will be useful for one case. We scanner only one row, but the predicate > doesn't contain the primary key, for this case, we will send a lot scanner > request but only one row return.It will be slow to send so much scanner > request one by one. So we need a concurrent way. And by this case we test, > for a 10G tablet, it will save a lot time for one machine. > # A way to split more spark task. To do so, we need get scanner tokens for > two step, first we send to the tserver to give range, then with this range we > get more scanner tokens. For our usage we make a tablet 10G, but we split a > task to process only 1G data. So we get better performance. > And all this feature has run well for us for half a year. We hope this > feature will be useful for the community. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2795) Prevent cascading failures by detecting that disks are full and rejecting attempts to add additional replicas to a tablet server
Mike Percy created KUDU-2795: Summary: Prevent cascading failures by detecting that disks are full and rejecting attempts to add additional replicas to a tablet server Key: KUDU-2795 URL: https://issues.apache.org/jira/browse/KUDU-2795 Project: Kudu Issue Type: Task Components: master, tserver Affects Versions: 1.8.0 Reporter: Mike Percy Over the weekend a case was reported where the tablet server disks were near-full across a Kudu cluster. One finally reached the tipping point and crashed because the WAL disk was out of space and a write failed. This caused a cascading failure because the replicas on that tablet server were re-replicated to the rest of the cluster nodes, pushing them beyond the tipping point and eventually the whole cluster crashed. We could potentially prevent the cascading failure by detecting that a tablet server is nearly full and reject or prevent attempts to move additional replicas to that server while it is in the "yellow zone" of disk space availability, preferring under-replicated tablets over an unavailable cluster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2794) Document how to identify and deal with KUDU-2233 corruption
Mike Percy created KUDU-2794: Summary: Document how to identify and deal with KUDU-2233 corruption Key: KUDU-2794 URL: https://issues.apache.org/jira/browse/KUDU-2794 Project: Kudu Issue Type: Task Components: documentation, tablet Reporter: Mike Percy Document how to identify and deal with KUDU-2233 corruption. This would benefit from a tool to detect KUDU-2233 corruption like the one discussed in KUDU-2793. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2793) Design a scan to detect KUDU-2233 corruption in a replica
Mike Percy created KUDU-2793: Summary: Design a scan to detect KUDU-2233 corruption in a replica Key: KUDU-2793 URL: https://issues.apache.org/jira/browse/KUDU-2793 Project: Kudu Issue Type: Task Components: tablet Affects Versions: 1.8.0 Reporter: Mike Percy We should design a scan to detect corruption in a replica as a result of KUDU-2233. This may simply be a checksum scan, which we already support, but that has not been verified. Today, when compaction is triggered in a KUDU-2233 corrupted replica, the tablet server will crash with a CHECK error. Ideally, when this detection scan notices such a corruption, it would cause the corrupt local replica to enter a FAILED tablet state. However, causing a crash might also be acceptable in controlled scenarios. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2792) Automatically retry failed bootstrap on tablets that failed to start due to disk space
Mike Percy created KUDU-2792: Summary: Automatically retry failed bootstrap on tablets that failed to start due to disk space Key: KUDU-2792 URL: https://issues.apache.org/jira/browse/KUDU-2792 Project: Kudu Issue Type: Task Components: tserver Affects Versions: 1.8.0 Reporter: Mike Percy If a tablet replica fails to bootstrap due to insufficient disk space to replay the WAL, it will remain in a state that looks like this in ksck, even if the user frees up disk space: {code:java} 5edf82f0516b4897b3a7991a7e67d71c (host1.example.com:7050): not running [LEADER] State: FAILED Data state: TABLET_DATA_READY Last status: IO error: Failed log replay. Reason: Failed to open new log: Insufficient disk space to allocate 8388608 bytes under path /data/1/kudu/tablet/wal/wals/5807c5100e0d4522a66e32efbb29d57e/.kudutmp.newsegmentzGFKEg (7939936256 bytes available vs 19993874923 bytes reserved) (error 28) {code} Today, this requires a tablet server restart to recover from. It should be possible for a tablet server (i.e. the TsTabletManager) to detect that the failure was temporary, not permanent, and retry the failed bootstrap later on when additional disk space has been freed. From a programming perspective, that may require dealing with some object lifecycle issues (i.e. not reusing the Tablet object from the failed bootstrap). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2783) ksck: indicate whether a tablet replica is recovering
Mike Percy created KUDU-2783: Summary: ksck: indicate whether a tablet replica is recovering Key: KUDU-2783 URL: https://issues.apache.org/jira/browse/KUDU-2783 Project: Kudu Issue Type: Task Components: ops-tooling Reporter: Mike Percy Got the following feedback from someone running Kudu. Add an indicator to the ksck output indicating whether a table or replica is getting better or not, potentially by looking at whether the replica is bootstrapping. Something like an indicator like ‘this one will retry’ vs. ‘this one is not trying anymore’. One way to do this would be to consider certain tablet data state + tablet state combinations such as INTIALIZING / BOOTSTRAPPING or COPYING as recovering and the rest of the bad ones as not making progress. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2655) Add metrics for metadata directory I/O
[ https://issues.apache.org/jira/browse/KUDU-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823538#comment-16823538 ] Mike Percy commented on KUDU-2655: -- This would be equally useful for performance questions around consensus metadata flush, which happens for configuration changes, leader changes, and voting. > Add metrics for metadata directory I/O > -- > > Key: KUDU-2655 > URL: https://issues.apache.org/jira/browse/KUDU-2655 > Project: Kudu > Issue Type: Improvement > Components: metrics >Affects Versions: 1.8.0 >Reporter: Will Berkeley >Assignee: Will Berkeley >Priority: Major > > There's good metrics for block manager (data dir) and WAL operations, like > {{block_manager_total_bytes_written}}, {{block_manager_total_bytes_read}}, > {{log_bytes_logged }}, and the {{log_append_latency}} histogram. What we are > missing are metrics about the amount of metadata I/O. It'd be nice to add > * metadata_bytes_read > * metadata_bytes_written > * latency histograms for bytes read and bytes written -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2782) Implement distributed tracing support in Kudu
Mike Percy created KUDU-2782: Summary: Implement distributed tracing support in Kudu Key: KUDU-2782 URL: https://issues.apache.org/jira/browse/KUDU-2782 Project: Kudu Issue Type: Task Components: ops-tooling Reporter: Mike Percy It would be useful to implement distributed tracing support in Kudu, especially something like OpenTracing support that we could use with Zipkin, Jaeger, DataDog, etc. Particularly useful would be auto-sampled and on-demand traces of write RPCs since that would help us identify slow nodes or hotspots in the replication group and troubleshoot performance and stability issues. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KUDU-2727) Contention on the Raft consensus lock can cause tablet service queue overflows
[ https://issues.apache.org/jira/browse/KUDU-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy reassigned KUDU-2727: Assignee: Mike Percy > Contention on the Raft consensus lock can cause tablet service queue overflows > -- > > Key: KUDU-2727 > URL: https://issues.apache.org/jira/browse/KUDU-2727 > Project: Kudu > Issue Type: Improvement >Reporter: Will Berkeley >Assignee: Mike Percy >Priority: Major > > Here's stacks illustrating the phenomenon: > {noformat} > tids=[2201] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() > 0xb4e68e kudu::consensus::Peer::SignalRequest() > 0xb9c0df kudu::consensus::PeerManager::SignalRequest() > 0xb8c178 kudu::consensus::RaftConsensus::Replicate() > 0xaab816 kudu::tablet::TransactionDriver::Prepare() > 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask() >0x1fa37ed kudu::ThreadPool::DispatchThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[4515] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() > 0xb74c60 kudu::consensus::RaftConsensus::NotifyCommitIndex() > 0xb59307 kudu::consensus::PeerMessageQueue::NotifyObserversTask() > 0xb54058 > _ZN4kudu8internal7InvokerILi2ENS0_9BindStateINS0_15RunnableAdapterIMNS_9consensus16PeerMessageQueueEFvRKSt8functionIFvPNS4_24PeerMessageQueueObserverEEFvPS5_SC_EFvNS0_17UnretainedWrapperIS5_EEZNS5_34NotifyObserversOfCommitIndexChangeElEUlS8_E_EEESH_E3RunEPNS0_13BindStateBaseE >0x1fa37ed kudu::ThreadPool::DispatchThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[22185,22194,22193,22188,22187,22186] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() > 0xb8bff8 > kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm() > 0xaaaef9 kudu::tablet::TransactionDriver::ExecuteAsync() > 0xaa3742 kudu::tablet::TabletReplica::SubmitWrite() > 0x92812d kudu::tserver::TabletServiceImpl::Write() >0x1e28f3c kudu::rpc::GeneratedServiceIf::Handle() >0x1e2986a kudu::rpc::ServicePool::RunThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[22192,22191] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() >0x1e13dec kudu::rpc::ResultTracker::TrackRpc() >0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle() >0x1e2986a kudu::rpc::ServicePool::RunThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[4426] > 0x379ba0f710 >0x206d3d0 >0x212fd25 google::protobuf::Message::SpaceUsedLong() >0x211dee4 > google::protobuf::internal::GeneratedMessageReflection::SpaceUsedLong() > 0xb6658e kudu::consensus::LogCache::AppendOperations() > 0xb5c539 kudu::consensus::PeerMessageQueue::AppendOperations() > 0xb5c7c7 kudu::consensus::PeerMessageQueue::AppendOperation() > 0xb7c675 > kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked() > 0xb8c147 kudu::consensus::RaftConsensus::Replicate() > 0xaab816 kudu::tablet::TransactionDriver::Prepare() > 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask() >0x1fa37ed kudu::ThreadPool::DispatchThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > {noformat} > {{kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()}} needs to > take the lock to check the term and the Raft role. When many RPCs come in for > the same tablet, the contention can hog service threads and cause queue > overflows on busy systems. > Yugabyte switched their equivalent lock to be an atomic that allows them to > read the term and role wait-free. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2727) Contention on the Raft consensus lock can cause tablet service queue overflows
[ https://issues.apache.org/jira/browse/KUDU-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16802279#comment-16802279 ] Mike Percy commented on KUDU-2727: -- I'm going to look at this in my spare time > Contention on the Raft consensus lock can cause tablet service queue overflows > -- > > Key: KUDU-2727 > URL: https://issues.apache.org/jira/browse/KUDU-2727 > Project: Kudu > Issue Type: Improvement >Reporter: Will Berkeley >Assignee: Mike Percy >Priority: Major > > Here's stacks illustrating the phenomenon: > {noformat} > tids=[2201] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() > 0xb4e68e kudu::consensus::Peer::SignalRequest() > 0xb9c0df kudu::consensus::PeerManager::SignalRequest() > 0xb8c178 kudu::consensus::RaftConsensus::Replicate() > 0xaab816 kudu::tablet::TransactionDriver::Prepare() > 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask() >0x1fa37ed kudu::ThreadPool::DispatchThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[4515] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() > 0xb74c60 kudu::consensus::RaftConsensus::NotifyCommitIndex() > 0xb59307 kudu::consensus::PeerMessageQueue::NotifyObserversTask() > 0xb54058 > _ZN4kudu8internal7InvokerILi2ENS0_9BindStateINS0_15RunnableAdapterIMNS_9consensus16PeerMessageQueueEFvRKSt8functionIFvPNS4_24PeerMessageQueueObserverEEFvPS5_SC_EFvNS0_17UnretainedWrapperIS5_EEZNS5_34NotifyObserversOfCommitIndexChangeElEUlS8_E_EEESH_E3RunEPNS0_13BindStateBaseE >0x1fa37ed kudu::ThreadPool::DispatchThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[22185,22194,22193,22188,22187,22186] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() > 0xb8bff8 > kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm() > 0xaaaef9 kudu::tablet::TransactionDriver::ExecuteAsync() > 0xaa3742 kudu::tablet::TabletReplica::SubmitWrite() > 0x92812d kudu::tserver::TabletServiceImpl::Write() >0x1e28f3c kudu::rpc::GeneratedServiceIf::Handle() >0x1e2986a kudu::rpc::ServicePool::RunThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[22192,22191] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() >0x1e13dec kudu::rpc::ResultTracker::TrackRpc() >0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle() >0x1e2986a kudu::rpc::ServicePool::RunThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[4426] > 0x379ba0f710 >0x206d3d0 >0x212fd25 google::protobuf::Message::SpaceUsedLong() >0x211dee4 > google::protobuf::internal::GeneratedMessageReflection::SpaceUsedLong() > 0xb6658e kudu::consensus::LogCache::AppendOperations() > 0xb5c539 kudu::consensus::PeerMessageQueue::AppendOperations() > 0xb5c7c7 kudu::consensus::PeerMessageQueue::AppendOperation() > 0xb7c675 > kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked() > 0xb8c147 kudu::consensus::RaftConsensus::Replicate() > 0xaab816 kudu::tablet::TransactionDriver::Prepare() > 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask() >0x1fa37ed kudu::ThreadPool::DispatchThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > {noformat} > {{kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()}} needs to > take the lock to check the term and the Raft role. When many RPCs come in for > the same tablet, the contention can hog service threads and cause queue > overflows on busy systems. > Yugabyte switched their equivalent lock to be an atomic that allows them to > read the term and role wait-free. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2744) Add RPC support for diff scans
[ https://issues.apache.org/jira/browse/KUDU-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2744: - Status: In Review (was: Open) > Add RPC support for diff scans > -- > > Key: KUDU-2744 > URL: https://issues.apache.org/jira/browse/KUDU-2744 > Project: Kudu > Issue Type: Task > Components: backup >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > > Add RPC support for diff scans -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2744) Add RPC support for diff scans
[ https://issues.apache.org/jira/browse/KUDU-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2744: - Resolution: Fixed Fix Version/s: 1.10.0 Status: Resolved (was: In Review) Merged as e8be768 > Add RPC support for diff scans > -- > > Key: KUDU-2744 > URL: https://issues.apache.org/jira/browse/KUDU-2744 > Project: Kudu > Issue Type: Task > Components: backup >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > Fix For: 1.10.0 > > > Add RPC support for diff scans -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2744) Add RPC support for diff scans
[ https://issues.apache.org/jira/browse/KUDU-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2744: - Code Review: https://gerrit.cloudera.org/c/12592/ > Add RPC support for diff scans > -- > > Key: KUDU-2744 > URL: https://issues.apache.org/jira/browse/KUDU-2744 > Project: Kudu > Issue Type: Task > Components: backup >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > > Add RPC support for diff scans -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2744) Add RPC support for diff scans
Mike Percy created KUDU-2744: Summary: Add RPC support for diff scans Key: KUDU-2744 URL: https://issues.apache.org/jira/browse/KUDU-2744 Project: Kudu Issue Type: Task Components: backup Reporter: Mike Percy Assignee: Mike Percy Add RPC support for diff scans -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KUDU-2741) Failure in TestMergeIterator.TestDeDupGhostRows
[ https://issues.apache.org/jira/browse/KUDU-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy resolved KUDU-2741. -- Resolution: Fixed Fix Version/s: 1.10.0 Fixed in d17e3ef345498777e32f2b275f952abac1369a7a > Failure in TestMergeIterator.TestDeDupGhostRows > --- > > Key: KUDU-2741 > URL: https://issues.apache.org/jira/browse/KUDU-2741 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Will Berkeley >Assignee: Mike Percy >Priority: Major > Fix For: 1.10.0 > > > Test log of reproducible failure below: > {noformat} > $ bin/generic_iterators-test --gtest_filter="*DeDup*" > --gtest_random_seed=1615295598 > Note: Google Test filter = *DeDup* > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from TestMergeIterator > [ RUN ] TestMergeIterator.TestDeDupGhostRows > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0311 13:16:42.837129 199316928 test_util.cc:212] Using random seed: > 1078076534 > I0311 13:16:42.839583 199316928 generic_iterators-test.cc:317] Time spent > sorting the expected results: real 0.000s user 0.000s sys 0.000s > I0311 13:16:42.839709 199316928 generic_iterators-test.cc:321] Time spent > shuffling the inputs: real 0.000s user 0.000s sys 0.000s > I0311 13:16:42.839901 199316928 generic_iterators-test.cc:346] Predicate: val > >= AND val < > ../../src/kudu/common/generic_iterators-test.cc:366: Failure > Expected: expected[total_idx] > Which is: 10264066 > To be equal to: row_val > Which is: 10282492 > Yielded out of order at idx 1823 > I0311 13:16:42.848778 199316928 generic_iterators-test.cc:348] Time spent > iterating merged lists: real 0.009s user 0.009s sys 0.000s > ../../src/kudu/common/generic_iterators-test.cc:414: Failure > Expected: TestMerge(kIntSchemaWithVCol, match_all_pred, true, true) doesn't > generate new fatal failures in the current thread. > Actual: it does. > [ FAILED ] TestMergeIterator.TestDeDupGhostRows (11 ms) > [--] 1 test from TestMergeIterator (11 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (12 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] TestMergeIterator.TestDeDupGhostRows > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KUDU-2741) Failure in TestMergeIterator.TestDeDupGhostRows
[ https://issues.apache.org/jira/browse/KUDU-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy reassigned KUDU-2741: Assignee: Mike Percy > Failure in TestMergeIterator.TestDeDupGhostRows > --- > > Key: KUDU-2741 > URL: https://issues.apache.org/jira/browse/KUDU-2741 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Will Berkeley >Assignee: Mike Percy >Priority: Major > > Test log of reproducible failure below: > {noformat} > $ bin/generic_iterators-test --gtest_filter="*DeDup*" > --gtest_random_seed=1615295598 > Note: Google Test filter = *DeDup* > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from TestMergeIterator > [ RUN ] TestMergeIterator.TestDeDupGhostRows > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0311 13:16:42.837129 199316928 test_util.cc:212] Using random seed: > 1078076534 > I0311 13:16:42.839583 199316928 generic_iterators-test.cc:317] Time spent > sorting the expected results: real 0.000s user 0.000s sys 0.000s > I0311 13:16:42.839709 199316928 generic_iterators-test.cc:321] Time spent > shuffling the inputs: real 0.000s user 0.000s sys 0.000s > I0311 13:16:42.839901 199316928 generic_iterators-test.cc:346] Predicate: val > >= AND val < > ../../src/kudu/common/generic_iterators-test.cc:366: Failure > Expected: expected[total_idx] > Which is: 10264066 > To be equal to: row_val > Which is: 10282492 > Yielded out of order at idx 1823 > I0311 13:16:42.848778 199316928 generic_iterators-test.cc:348] Time spent > iterating merged lists: real 0.009s user 0.009s sys 0.000s > ../../src/kudu/common/generic_iterators-test.cc:414: Failure > Expected: TestMerge(kIntSchemaWithVCol, match_all_pred, true, true) doesn't > generate new fatal failures in the current thread. > Actual: it does. > [ FAILED ] TestMergeIterator.TestDeDupGhostRows (11 ms) > [--] 1 test from TestMergeIterator (11 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (12 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] TestMergeIterator.TestDeDupGhostRows > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2741) Failure in TestMergeIterator.TestDeDupGhostRows
[ https://issues.apache.org/jira/browse/KUDU-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16789932#comment-16789932 ] Mike Percy commented on KUDU-2741: -- Thanks for filing – I'm looking at this. > Failure in TestMergeIterator.TestDeDupGhostRows > --- > > Key: KUDU-2741 > URL: https://issues.apache.org/jira/browse/KUDU-2741 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Will Berkeley >Assignee: Mike Percy >Priority: Major > > Test log of reproducible failure below: > {noformat} > $ bin/generic_iterators-test --gtest_filter="*DeDup*" > --gtest_random_seed=1615295598 > Note: Google Test filter = *DeDup* > [==] Running 1 test from 1 test case. > [--] Global test environment set-up. > [--] 1 test from TestMergeIterator > [ RUN ] TestMergeIterator.TestDeDupGhostRows > WARNING: Logging before InitGoogleLogging() is written to STDERR > I0311 13:16:42.837129 199316928 test_util.cc:212] Using random seed: > 1078076534 > I0311 13:16:42.839583 199316928 generic_iterators-test.cc:317] Time spent > sorting the expected results: real 0.000s user 0.000s sys 0.000s > I0311 13:16:42.839709 199316928 generic_iterators-test.cc:321] Time spent > shuffling the inputs: real 0.000s user 0.000s sys 0.000s > I0311 13:16:42.839901 199316928 generic_iterators-test.cc:346] Predicate: val > >= AND val < > ../../src/kudu/common/generic_iterators-test.cc:366: Failure > Expected: expected[total_idx] > Which is: 10264066 > To be equal to: row_val > Which is: 10282492 > Yielded out of order at idx 1823 > I0311 13:16:42.848778 199316928 generic_iterators-test.cc:348] Time spent > iterating merged lists: real 0.009s user 0.009s sys 0.000s > ../../src/kudu/common/generic_iterators-test.cc:414: Failure > Expected: TestMerge(kIntSchemaWithVCol, match_all_pred, true, true) doesn't > generate new fatal failures in the current thread. > Actual: it does. > [ FAILED ] TestMergeIterator.TestDeDupGhostRows (11 ms) > [--] 1 test from TestMergeIterator (11 ms total) > [--] Global test environment tear-down > [==] 1 test from 1 test case ran. (12 ms total) > [ PASSED ] 0 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] TestMergeIterator.TestDeDupGhostRows > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2645) Diff scanner should perform a merge on the rowset iterators at scan time
[ https://issues.apache.org/jira/browse/KUDU-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2645: - Resolution: Fixed Fix Version/s: 1.10.0 Status: Resolved (was: In Review) Merged as 5953357 > Diff scanner should perform a merge on the rowset iterators at scan time > > > Key: KUDU-2645 > URL: https://issues.apache.org/jira/browse/KUDU-2645 > Project: Kudu > Issue Type: New Feature > Components: tablet >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > Fix For: 1.10.0 > > > In order to perform a diff scan we will need the MergeIterator to ensure that > duplicate ghost rows are not returned in cases where a row was deleted and > flushed, then reinserted into a new rowset during the time period covered by > the diff scan. In such a case, only one representation of the row should be > returned, which is the reinserted one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2740) TabletCopyITest.TestTabletCopyingDeletedTabletFails flaky due to lack of leader election retries
Mike Percy created KUDU-2740: Summary: TabletCopyITest.TestTabletCopyingDeletedTabletFails flaky due to lack of leader election retries Key: KUDU-2740 URL: https://issues.apache.org/jira/browse/KUDU-2740 Project: Kudu Issue Type: Bug Affects Versions: 1.9.0 Reporter: Mike Percy This test can be flaky because we disable failure detection and neglect to retry the leader election. An example error looks like this: {code:java} I0307 01:24:56.238428 5333 tablet_service.cc:1239] Received Run Leader Election RPC: tablet_id: "e75f819cfb0a45c483899e2396b3a07a" dest_uuid: "89a95beac49a43d0b02b662e1a228337" from {username='slave'} at 127.0.0.1:48832 I0307 01:24:56.238809 5333 raft_consensus.cc:472] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [term 0 FOLLOWER]: Starting forced leader election (received explicit request) I0307 01:24:56.238982 5333 raft_consensus.cc:2886] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [term 0 FOLLOWER]: Advancing to term 1 W0307 01:24:56.255393 5500 tablet.cc:1786] T e75f819cfb0a45c483899e2396b3a07a P 7509a1eca3f14f45903715fdb6a20f77: Can't schedule compaction. Clean time has not been advanced past its initial value. W0307 01:24:56.261915 5377 tablet.cc:1786] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337: Can't schedule compaction. Clean time has not been advanced past its initial value. W0307 01:24:56.291085 5622 tablet.cc:1786] T e75f819cfb0a45c483899e2396b3a07a P 0caf13c7f5a64af781811ca30ab3656d: Can't schedule compaction. Clean time has not been advanced past its initial value. W0307 01:24:58.477632 5333 consensus_meta.cc:220] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337: Time spent flushing consensus metadata: real 2.238s user 0.003s sys 0.000s I0307 01:24:58.477829 5333 raft_consensus.cc:494] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [term 1 FOLLOWER]: Starting forced leader election with config: opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: "89a95beac49a43d0b02b662e1a228337" member_type: VOTER last_known_addr { host: "127.4.141.65" port: 44695 } } peers { permanent_uuid: "0caf13c7f5a64af781811ca30ab3656d" member_type: VOTER last_known_addr { host: "127.4.141.67" port: 32845 } } peers { permanent_uuid: "7509a1eca3f14f45903715fdb6a20f77" member_type: VOTER last_known_addr { host: "127.4.141.66" port: 35595 } } I0307 01:24:58.479737 5333 leader_election.cc:296] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [CANDIDATE]: Term 1 election: Requested vote from peers 0caf13c7f5a64af781811ca30ab3656d (127.4.141.67:32845), 7509a1eca3f14f45903715fdb6a20f77 (127.4.141.66:35595) I0307 01:24:58.480012 5333 rpcz_store.cc:269] Call kudu.consensus.ConsensusService.RunLeaderElection from 127.0.0.1:48832 (request call id 3) took 2241ms. Request Metrics: {"dns_us":93} I0307 01:24:58.487798 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0 I0307 01:24:58.493844 5578 tablet_service.cc:1122] Received RequestConsensusVote() RPC: tablet_id: "e75f819cfb0a45c483899e2396b3a07a" candidate_uuid: "89a95beac49a43d0b02b662e1a228337" candidate_term: 1 candidate_status { last_received { term: 0 index: 0 } } ignore_live_leader: true dest_uuid: "0caf13c7f5a64af781811ca30ab3656d" I0307 01:24:58.494168 5578 raft_consensus.cc:2886] T e75f819cfb0a45c483899e2396b3a07a P 0caf13c7f5a64af781811ca30ab3656d [term 0 FOLLOWER]: Advancing to term 1 I0307 01:24:58.494354 5456 tablet_service.cc:1122] Received RequestConsensusVote() RPC: tablet_id: "e75f819cfb0a45c483899e2396b3a07a" candidate_uuid: "89a95beac49a43d0b02b662e1a228337" candidate_term: 1 candidate_status { last_received { term: 0 index: 0 } } ignore_live_leader: true dest_uuid: "7509a1eca3f14f45903715fdb6a20f77" I0307 01:24:58.494655 5456 raft_consensus.cc:2886] T e75f819cfb0a45c483899e2396b3a07a P 7509a1eca3f14f45903715fdb6a20f77 [term 0 FOLLOWER]: Advancing to term 1 W0307 01:24:59.988574 5267 leader_election.cc:341] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [CANDIDATE]: Term 1 election: RPC error from VoteRequest() call to peer 7509a1eca3f14f45903715fdb6a20f77 (127.4.141.66:35595): Timed out: RequestConsensusVote RPC to 127.4.141.66:35595 timed out after 1.507s (SENT) W0307 01:24:59.988920 5266 leader_election.cc:341] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [CANDIDATE]: Term 1 election: RPC error from VoteRequest() call to peer 0caf13c7f5a64af781811ca30ab3656d (127.4.141.67:32845): Timed out: RequestConsensusVote RPC to 127.4.141.67:32845 timed out after 1.507s (SENT) I0307 01:24:59.989068 5266 leader_election.cc:310] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [CANDIDATE]: Term 1 election: Election decided. Result: candidate lost.
[jira] [Created] (KUDU-2738) linked_list-test occasionally fails with webserver port bind failure: address already in use
Mike Percy created KUDU-2738: Summary: linked_list-test occasionally fails with webserver port bind failure: address already in use Key: KUDU-2738 URL: https://issues.apache.org/jira/browse/KUDU-2738 Project: Kudu Issue Type: Bug Components: test Affects Versions: 1.9.0 Reporter: Mike Percy Occasionally I see linked_list-test fail with the following error on Linux in an automated test environment: {code:java} E0306 23:35:25.207222 19523 webserver.cc:369] Webserver: set_ports_option: cannot bind to 127.14.25.194:49008: 98 (Address already in use) W0306 23:35:25.207244 19523 net_util.cc:457] Trying to use lsof to find any processes listening on 0.0.0.0:49008 I0306 23:35:25.207249 19523 net_util.cc:460] $ export PATH=$PATH:/usr/sbin ; lsof -n -i 'TCP:49008' -sTCP:LISTEN ; for pid in $(lsof -F p -n -i 'TCP:49008' -sTCP:LISTEN | grep p | cut -f 2 -dp) ; do while [ $pid -gt 1 ] ; do ps h -fp $pid ; stat=($(
[jira] [Updated] (KUDU-2738) linked_list-test occasionally fails with webserver port bind failure: address already in use
[ https://issues.apache.org/jira/browse/KUDU-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2738: - Attachment: jenkins_output.txt.gz > linked_list-test occasionally fails with webserver port bind failure: address > already in use > > > Key: KUDU-2738 > URL: https://issues.apache.org/jira/browse/KUDU-2738 > Project: Kudu > Issue Type: Bug > Components: test >Affects Versions: 1.9.0 >Reporter: Mike Percy >Priority: Trivial > Attachments: jenkins_output.txt.gz > > > Occasionally I see linked_list-test fail with the following error on Linux in > an automated test environment: > {code:java} > E0306 23:35:25.207222 19523 webserver.cc:369] Webserver: set_ports_option: > cannot bind to 127.14.25.194:49008: 98 (Address already in use) > W0306 23:35:25.207244 19523 net_util.cc:457] Trying to use lsof to find any > processes listening on 0.0.0.0:49008 > I0306 23:35:25.207249 19523 net_util.cc:460] $ export PATH=$PATH:/usr/sbin ; > lsof -n -i 'TCP:49008' -sTCP:LISTEN ; for pid in $(lsof -F p -n -i > 'TCP:49008' -sTCP:LISTEN | grep p | cut -f 2 -dp) ; do while [ $pid -gt 1 ] ; > do ps h -fp $pid ; stat=($( ... > W0306 23:35:25.583075 19523 net_util.cc:467] > F0306 23:35:25.583206 19523 tablet_server_main.cc:89] Check failed: _s.ok() > Bad status: Runtime error: Webserver: could not start on address > 127.14.25.194:49008: set_ports_option: cannot bind to 127.14.25.194:49008: 98 > (Address already in use){code} > I am not sure what would have bound to 0.0.0.0:49008 for a short period of > time, or used 126.14.25.194:49008 as an ephemeral address / port pair since > it's such a unique loopback IP address. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2736) RemoteKsckTest.TestClusterWithLocation is flaky
Mike Percy created KUDU-2736: Summary: RemoteKsckTest.TestClusterWithLocation is flaky Key: KUDU-2736 URL: https://issues.apache.org/jira/browse/KUDU-2736 Project: Kudu Issue Type: Improvement Components: test Affects Versions: 1.9.0 Reporter: Mike Percy RemoteKsckTest.TestClusterWithLocation is flaky Alexey took a look at it and here is the analysis: In essence, due to slowness of TSAN builds, connection negotiation from kudu CLI to one of master servers timed out, so one of the preconditions of the test didn't meet. The error output by the test was: {code:java} /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:523: Failure Failed Bad status: Network error: failed to gather info from all masters: 1 of 3 had errors {code} The corresponding error in the master's log was: {code:java} W0221 12:38:27.119146 31380 negotiation.cc:313] Failed RPC negotiation. Trace: 0221 12:38:23.949428 (+ 0us) reactor.cc:583] Submitting negotiation task for client connection to 127.25.42.190:51799 0221 12:38:25.362220 (+1412792us) negotiation.cc:98] Waiting for socket to connect 0221 12:38:25.363489 (+ 1269us) client_negotiation.cc:167] Beginning negotiation 0221 12:38:25.369976 (+ 6487us) client_negotiation.cc:244] Sending NEGOTIATE NegotiatePB request 0221 12:38:25.431582 (+ 61606us) client_negotiation.cc:261] Received NEGOTIATE NegotiatePB response 0221 12:38:25.431610 (+ 28us) client_negotiation.cc:355] Received NEGOTIATE response from server 0221 12:38:25.432659 (+ 1049us) client_negotiation.cc:182] Negotiated authn=SASL 0221 12:38:27.051125 (+1618466us) client_negotiation.cc:483] Received TLS_HANDSHAKE response from server 0221 12:38:27.062085 (+ 10960us) client_negotiation.cc:471] Sending TLS_HANDSHAKE message to server 0221 12:38:27.062132 (+ 47us) client_negotiation.cc:244] Sending TLS_HANDSHAKE NegotiatePB request 0221 12:38:27.064391 (+ 2259us) negotiation.cc:304] Negotiation complete: Timed out: Client connection negotiation failed: client connection to 127.25.42.190:51799: BlockingWrite timed out {code} We are seeing this on the flaky test dashboard for both TSAN and ASAN builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2735) RemoteKsckTest.TestClusterWithLocation is flaky
Mike Percy created KUDU-2735: Summary: RemoteKsckTest.TestClusterWithLocation is flaky Key: KUDU-2735 URL: https://issues.apache.org/jira/browse/KUDU-2735 Project: Kudu Issue Type: Improvement Components: test Affects Versions: 1.9.0 Reporter: Mike Percy RemoteKsckTest.TestClusterWithLocation is flaky Alexey took a look at it and here is the analysis: In essence, due to slowness of TSAN builds, connection negotiation from kudu CLI to one of master servers timed out, so one of the preconditions of the test didn't meet. The error output by the test was: {code:java} /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:523: Failure Failed Bad status: Network error: failed to gather info from all masters: 1 of 3 had errors {code} The corresponding error in the master's log was: {code:java} W0221 12:38:27.119146 31380 negotiation.cc:313] Failed RPC negotiation. Trace: 0221 12:38:23.949428 (+ 0us) reactor.cc:583] Submitting negotiation task for client connection to 127.25.42.190:51799 0221 12:38:25.362220 (+1412792us) negotiation.cc:98] Waiting for socket to connect 0221 12:38:25.363489 (+ 1269us) client_negotiation.cc:167] Beginning negotiation 0221 12:38:25.369976 (+ 6487us) client_negotiation.cc:244] Sending NEGOTIATE NegotiatePB request 0221 12:38:25.431582 (+ 61606us) client_negotiation.cc:261] Received NEGOTIATE NegotiatePB response 0221 12:38:25.431610 (+ 28us) client_negotiation.cc:355] Received NEGOTIATE response from server 0221 12:38:25.432659 (+ 1049us) client_negotiation.cc:182] Negotiated authn=SASL 0221 12:38:27.051125 (+1618466us) client_negotiation.cc:483] Received TLS_HANDSHAKE response from server 0221 12:38:27.062085 (+ 10960us) client_negotiation.cc:471] Sending TLS_HANDSHAKE message to server 0221 12:38:27.062132 (+ 47us) client_negotiation.cc:244] Sending TLS_HANDSHAKE NegotiatePB request 0221 12:38:27.064391 (+ 2259us) negotiation.cc:304] Negotiation complete: Timed out: Client connection negotiation failed: client connection to 127.25.42.190:51799: BlockingWrite timed out {code} We are seeing this on the flaky test dashboard for both TSAN and ASAN builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2734) RemoteKsckTest.TestClusterWithLocation is flaky
Mike Percy created KUDU-2734: Summary: RemoteKsckTest.TestClusterWithLocation is flaky Key: KUDU-2734 URL: https://issues.apache.org/jira/browse/KUDU-2734 Project: Kudu Issue Type: Improvement Components: test Affects Versions: 1.9.0 Reporter: Mike Percy RemoteKsckTest.TestClusterWithLocation is flaky Alexey took a look at it and here is the analysis: In essence, due to slowness of TSAN builds, connection negotiation from kudu CLI to one of master servers timed out, so one of the preconditions of the test didn't meet. The error output by the test was: {code:java} /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:523: Failure Failed Bad status: Network error: failed to gather info from all masters: 1 of 3 had errors {code} The corresponding error in the master's log was: {code:java} W0221 12:38:27.119146 31380 negotiation.cc:313] Failed RPC negotiation. Trace: 0221 12:38:23.949428 (+ 0us) reactor.cc:583] Submitting negotiation task for client connection to 127.25.42.190:51799 0221 12:38:25.362220 (+1412792us) negotiation.cc:98] Waiting for socket to connect 0221 12:38:25.363489 (+ 1269us) client_negotiation.cc:167] Beginning negotiation 0221 12:38:25.369976 (+ 6487us) client_negotiation.cc:244] Sending NEGOTIATE NegotiatePB request 0221 12:38:25.431582 (+ 61606us) client_negotiation.cc:261] Received NEGOTIATE NegotiatePB response 0221 12:38:25.431610 (+ 28us) client_negotiation.cc:355] Received NEGOTIATE response from server 0221 12:38:25.432659 (+ 1049us) client_negotiation.cc:182] Negotiated authn=SASL 0221 12:38:27.051125 (+1618466us) client_negotiation.cc:483] Received TLS_HANDSHAKE response from server 0221 12:38:27.062085 (+ 10960us) client_negotiation.cc:471] Sending TLS_HANDSHAKE message to server 0221 12:38:27.062132 (+ 47us) client_negotiation.cc:244] Sending TLS_HANDSHAKE NegotiatePB request 0221 12:38:27.064391 (+ 2259us) negotiation.cc:304] Negotiation complete: Timed out: Client connection negotiation failed: client connection to 127.25.42.190:51799: BlockingWrite timed out {code} We are seeing this on the flaky test dashboard for both TSAN and ASAN builds. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2390) ITClient fails with "Row count unexpectedly decreased"
[ https://issues.apache.org/jira/browse/KUDU-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785186#comment-16785186 ] Mike Percy commented on KUDU-2390: -- Another instance of this observed; attaching the failure log. I haven't spent that much time investigating this yet, but maybe the chaos thread restarting one of the tablet servers could have resulted in an under-count, since one of the tablet servers is restarted 2 seconds before we see the error in the log: 20:03:34.125 [INFO - Thread-5] (MiniKuduCluster.java:368) Killing tablet server 127.1.121.66:48003 20:03:34.131 [INFO - Thread-5] (MiniKuduCluster.java:349) Starting tablet server 127.1.121.66:48003 20:03:36.094 [ERROR - Thread-7] (ITClient.java:135) Row count unexpectedly decreased from 87549 to 59949 > ITClient fails with "Row count unexpectedly decreased" > -- > > Key: KUDU-2390 > URL: https://issues.apache.org/jira/browse/KUDU-2390 > Project: Kudu > Issue Type: Bug > Components: java, test >Affects Versions: 1.7.0, 1.8.0 >Reporter: Todd Lipcon >Priority: Critical > Attachments: Stdout.txt.gz, TEST-org.apache.kudu.client.ITClient.xml, > TEST-org.apache.kudu.client.ITClient.xml.gz, > TEST-org.apache.kudu.client.ITClient.xml.xz > > > On master, hit the following failure of ITClient: > {code} > 20:05:05.407 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:934) > AsyncKuduScanner$Response(scannerId = "6ddf5d0da48241aea4b9eb51645716cc", > data = RowResultIterator for 27600 rows, more = true, responseScanTimestamp = > 6234957022375723008) for scanner > 20:05:05.407 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:447) Scanner > "6ddf5d0da48241aea4b9eb51645716cc" opened on > d78cb5506f6e4e17bd54fdaf1819a8a2@[729d64003e7740cabb650f8f6aea4af6(127.1.76.194:60468),7a2e5f9b2be9497fadc30b81a6a50b24(127.1.76.19 > 20:05:05.409 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:934) > AsyncKuduScanner$Response(scannerId = "", data = RowResultIterator for 7314 > rows, more = false) for scanner > KuduScanner(table=org.apache.kudu.client.ITClient-1522206255318, tablet=d78c > 20:05:05.409 [INFO - Thread-4] (ITClient.java:397) New row count 90114 > 20:05:05.414 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:934) > AsyncKuduScanner$Response(scannerId = "c230614ad13e40478254b785995d1d7c", > data = RowResultIterator for 27600 rows, more = true, responseScanTimestamp = > 6234957022413987840) for scanner > 20:05:05.414 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:447) Scanner > "c230614ad13e40478254b785995d1d7c" opened on > d78cb5506f6e4e17bd54fdaf1819a8a2@[729d64003e7740cabb650f8f6aea4af6(127.1.76.194:60468),7a2e5f9b2be9497fadc30b81a6a50b24(127.1.76.19 > 20:05:05.419 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:934) > AsyncKuduScanner$Response(scannerId = "", data = RowResultIterator for 27600 > rows, more = true) for scanner > KuduScanner(table=org.apache.kudu.client.ITClient-1522206255318, tablet=d78c > 20:05:05.420 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:934) > AsyncKuduScanner$Response(scannerId = "", data = RowResultIterator for 7342 > rows, more = false) for scanner > KuduScanner(table=org.apache.kudu.client.ITClient-1522206255318, tablet=d78c > 20:05:05.421 [ERROR - Thread-4] (ITClient.java:134) Row count unexpectedly > decreased from 90114to 62542 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2390) ITClient fails with "Row count unexpectedly decreased"
[ https://issues.apache.org/jira/browse/KUDU-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2390: - Attachment: TEST-org.apache.kudu.client.ITClient.xml.gz > ITClient fails with "Row count unexpectedly decreased" > -- > > Key: KUDU-2390 > URL: https://issues.apache.org/jira/browse/KUDU-2390 > Project: Kudu > Issue Type: Bug > Components: java, test >Affects Versions: 1.7.0, 1.8.0 >Reporter: Todd Lipcon >Priority: Critical > Attachments: Stdout.txt.gz, TEST-org.apache.kudu.client.ITClient.xml, > TEST-org.apache.kudu.client.ITClient.xml.gz, > TEST-org.apache.kudu.client.ITClient.xml.xz > > > On master, hit the following failure of ITClient: > {code} > 20:05:05.407 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:934) > AsyncKuduScanner$Response(scannerId = "6ddf5d0da48241aea4b9eb51645716cc", > data = RowResultIterator for 27600 rows, more = true, responseScanTimestamp = > 6234957022375723008) for scanner > 20:05:05.407 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:447) Scanner > "6ddf5d0da48241aea4b9eb51645716cc" opened on > d78cb5506f6e4e17bd54fdaf1819a8a2@[729d64003e7740cabb650f8f6aea4af6(127.1.76.194:60468),7a2e5f9b2be9497fadc30b81a6a50b24(127.1.76.19 > 20:05:05.409 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:934) > AsyncKuduScanner$Response(scannerId = "", data = RowResultIterator for 7314 > rows, more = false) for scanner > KuduScanner(table=org.apache.kudu.client.ITClient-1522206255318, tablet=d78c > 20:05:05.409 [INFO - Thread-4] (ITClient.java:397) New row count 90114 > 20:05:05.414 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:934) > AsyncKuduScanner$Response(scannerId = "c230614ad13e40478254b785995d1d7c", > data = RowResultIterator for 27600 rows, more = true, responseScanTimestamp = > 6234957022413987840) for scanner > 20:05:05.414 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:447) Scanner > "c230614ad13e40478254b785995d1d7c" opened on > d78cb5506f6e4e17bd54fdaf1819a8a2@[729d64003e7740cabb650f8f6aea4af6(127.1.76.194:60468),7a2e5f9b2be9497fadc30b81a6a50b24(127.1.76.19 > 20:05:05.419 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:934) > AsyncKuduScanner$Response(scannerId = "", data = RowResultIterator for 27600 > rows, more = true) for scanner > KuduScanner(table=org.apache.kudu.client.ITClient-1522206255318, tablet=d78c > 20:05:05.420 [DEBUG - New I/O worker #17] (AsyncKuduScanner.java:934) > AsyncKuduScanner$Response(scannerId = "", data = RowResultIterator for 7342 > rows, more = false) for scanner > KuduScanner(table=org.apache.kudu.client.ITClient-1522206255318, tablet=d78c > 20:05:05.421 [ERROR - Thread-4] (ITClient.java:134) Row count unexpectedly > decreased from 90114to 62542 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2733) ITClient java test flaky: chaos thread failure: Couldn't restart a TS
Mike Percy created KUDU-2733: Summary: ITClient java test flaky: chaos thread failure: Couldn't restart a TS Key: KUDU-2733 URL: https://issues.apache.org/jira/browse/KUDU-2733 Project: Kudu Issue Type: Improvement Components: java, test Affects Versions: 1.9.0 Reporter: Mike Percy Attachments: TEST-org.apache.kudu.client.ITClient.xml Sometimes in ITClient.test(), the chaos thread cannot restart the tablet server. The error looks like this: {code:java} 03:53:33.233 [ERROR - Thread-13] (ITClient.java:135) Couldn't restart a TS java.lang.RuntimeException: Tablet server 127.26.66.66:38801 not found at org.apache.kudu.test.cluster.MiniKuduCluster.getTabletServer(MiniKuduCluster.java:513) at org.apache.kudu.test.cluster.MiniKuduCluster.killTabletServer(MiniKuduCluster.java:364) at org.apache.kudu.test.KuduTestHarness.restartTabletServer(KuduTestHarness.java:285) at org.apache.kudu.client.ITClient$ChaosThread.restartTS(ITClient.java:207) at org.apache.kudu.client.ITClient$ChaosThread.run(ITClient.java:158) at java.lang.Thread.run(Thread.java:745) {code} Attaching a test log. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KUDU-2411) Create a public test utility artifact
[ https://issues.apache.org/jira/browse/KUDU-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy resolved KUDU-2411. -- Resolution: Fixed Fix Version/s: 1.9.0 This capability made it into 1.9.0 > Create a public test utility artifact > - > > Key: KUDU-2411 > URL: https://issues.apache.org/jira/browse/KUDU-2411 > Project: Kudu > Issue Type: Improvement > Components: java >Affects Versions: 1.7.0 >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Labels: community > Fix For: 1.9.0 > > > Create a public published test utility jar that contains useful testing > utilities for applications that integrate with Kudu including things like > BaseKuduTest.java and MiniKuduCluster.java. > This has the added benefit of eliminating the unusual dependency on all of > kudu-clients test in each of the other java modules. This could likely be > used in our examples code too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-1868) Java client mishandles socket read timeouts for scans
[ https://issues.apache.org/jira/browse/KUDU-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783805#comment-16783805 ] Mike Percy commented on KUDU-1868: -- Merged as part of these patches from Will: * [https://gerrit.cloudera.org/c/12338/] * [https://gerrit.cloudera.org/c/12363/] > Java client mishandles socket read timeouts for scans > - > > Key: KUDU-1868 > URL: https://issues.apache.org/jira/browse/KUDU-1868 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 1.2.0 >Reporter: Jean-Daniel Cryans >Assignee: Will Berkeley >Priority: Major > Labels: backup > > Scan calls from the Java client that take more than the socket read timeout > get retried (unless the operation timeout has expired) instead of being > killed. Users will see this: > {code} > org.apache.kudu.client.NonRecoverableException: Invalid call sequence ID in > scan request > {code} > Note that the right behavior here would still end up killing the scanner, so > this is really a problem the user has to deal with! It's usually caused by > slow IO, combined with very selection scans. > Workaround: set defaultSocketReadTimeoutMs higher, ideally equal to > defaultOperationTimeoutMs (the defaults are 10 and 30 seconds respectively). > But really the user should investigate why single the scans are so slow. > One potentially easy fix to this is to handle retries differently for > scanners so that the user gets nicer exception. A harder fix is to handle > socket read timeouts completely differently, basically it should be per-RPC > and not per TabletClient like it is right now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KUDU-1868) Java client mishandles socket read timeouts for scans
[ https://issues.apache.org/jira/browse/KUDU-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy resolved KUDU-1868. -- Resolution: Fixed Fix Version/s: 1.9.0 > Java client mishandles socket read timeouts for scans > - > > Key: KUDU-1868 > URL: https://issues.apache.org/jira/browse/KUDU-1868 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 1.2.0 >Reporter: Jean-Daniel Cryans >Assignee: Will Berkeley >Priority: Major > Labels: backup > Fix For: 1.9.0 > > > Scan calls from the Java client that take more than the socket read timeout > get retried (unless the operation timeout has expired) instead of being > killed. Users will see this: > {code} > org.apache.kudu.client.NonRecoverableException: Invalid call sequence ID in > scan request > {code} > Note that the right behavior here would still end up killing the scanner, so > this is really a problem the user has to deal with! It's usually caused by > slow IO, combined with very selection scans. > Workaround: set defaultSocketReadTimeoutMs higher, ideally equal to > defaultOperationTimeoutMs (the defaults are 10 and 30 seconds respectively). > But really the user should investigate why single the scans are so slow. > One potentially easy fix to this is to handle retries differently for > scanners so that the user gets nicer exception. A harder fix is to handle > socket read timeouts completely differently, basically it should be per-RPC > and not per TabletClient like it is right now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2724) Binary jar build on OSX should specify target macos version
Mike Percy created KUDU-2724: Summary: Binary jar build on OSX should specify target macos version Key: KUDU-2724 URL: https://issues.apache.org/jira/browse/KUDU-2724 Project: Kudu Issue Type: Improvement Reporter: Mike Percy The binary test jar build should use one of the commonly-used options to specify a target macOS version when building the binary jar, so that it isn't required to build on an old platform to get wide compatibility. The common methods seem to be documented here: [https://cmake.org/cmake/help/v3.0/variable/CMAKE_OSX_DEPLOYMENT_TARGET.html] These include specifying the compiler flag -mmacosx-version-min, the environment variable MACOSX_DEPLOYMENT_TARGET, or the CMake variable CMAKE_OSX_DEPLOYMENT_TARGET. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2696) libgmock is linked into the kudu cli binary
Mike Percy created KUDU-2696: Summary: libgmock is linked into the kudu cli binary Key: KUDU-2696 URL: https://issues.apache.org/jira/browse/KUDU-2696 Project: Kudu Issue Type: Bug Affects Versions: 1.8.0 Reporter: Mike Percy libgmock is linked into the kudu cli binary, even though we consider it a test-only dependency. Possibly a configuration problem in our cmake files? {code:java} $ ldd build/dynclang/bin/kudu | grep mock libgmock.so => /home/mpercy/src/kudu/thirdparty/installed/uninstrumented/lib/libgmock.so (0x7f01f1495000) {code} The gmock dependency does not appear in the server binaries, as expected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2694) DeleteTabletITest.TestLeaderElectionDuringDeleteTablet is flaky
Mike Percy created KUDU-2694: Summary: DeleteTabletITest.TestLeaderElectionDuringDeleteTablet is flaky Key: KUDU-2694 URL: https://issues.apache.org/jira/browse/KUDU-2694 Project: Kudu Issue Type: Bug Components: consensus Reporter: Mike Percy Attachments: delete_tablet-itest.txt.gz DeleteTabletITest.TestLeaderElectionDuringDeleteTablet is slightly flaky and reporting bad health from the leader in some cases. Attaching log file from a dist-test flaky-test job run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2645) Diff scanner should perform a merge on the rowset iterators at scan time
[ https://issues.apache.org/jira/browse/KUDU-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2645: - Status: In Review (was: In Progress) > Diff scanner should perform a merge on the rowset iterators at scan time > > > Key: KUDU-2645 > URL: https://issues.apache.org/jira/browse/KUDU-2645 > Project: Kudu > Issue Type: New Feature > Components: tablet >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > > In order to perform a diff scan we will need the MergeIterator to ensure that > duplicate ghost rows are not returned in cases where a row was deleted and > flushed, then reinserted into a new rowset during the time period covered by > the diff scan. In such a case, only one representation of the row should be > returned, which is the reinserted one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2645) Diff scanner should perform a merge on the rowset iterators at scan time
[ https://issues.apache.org/jira/browse/KUDU-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2645: - Code Review: https://gerrit.cloudera.org/c/12205/ > Diff scanner should perform a merge on the rowset iterators at scan time > > > Key: KUDU-2645 > URL: https://issues.apache.org/jira/browse/KUDU-2645 > Project: Kudu > Issue Type: New Feature > Components: tablet >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > > In order to perform a diff scan we will need the MergeIterator to ensure that > duplicate ghost rows are not returned in cases where a row was deleted and > flushed, then reinserted into a new rowset during the time period covered by > the diff scan. In such a case, only one representation of the row should be > returned, which is the reinserted one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2693) Buffer DiskRowSet flushes to more efficiently write many columns
Mike Percy created KUDU-2693: Summary: Buffer DiskRowSet flushes to more efficiently write many columns Key: KUDU-2693 URL: https://issues.apache.org/jira/browse/KUDU-2693 Project: Kudu Issue Type: Improvement Components: fs, tablet Affects Versions: 1.9.0 Reporter: Mike Percy When looking at a trace of some MRS flushes on a table with 280 columns, it was observed that during the course of the flush some 695 fdatasync() calls occurred. One possible way to minimize the number of fsync calls would be to flush directly to memory buffers first, determine the ideal layout on disk for the flushed blocks (possibly striped across one log block container per data disk) and then potentially write the data out to the containers in parallel. This would require some memory buffer space to be reserved per maintenance manager thread, possibly 64MB since the DRS roll size is 32MB. According to Todd we could probably do it all in LogBlockManager by adding a new flag to CreateBlockOptions that says whether to buffer or something like that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2692) Remove requirements for virtual columns to specify a read default and not be nullable
Mike Percy created KUDU-2692: Summary: Remove requirements for virtual columns to specify a read default and not be nullable Key: KUDU-2692 URL: https://issues.apache.org/jira/browse/KUDU-2692 Project: Kudu Issue Type: Improvement Components: tablet Reporter: Mike Percy Virtual column types such as IS_DELETED currently require a read default to be specified, in addition to not being allowed to be nullable. Consider relaxing these requirements to improve the user experience when working with virtual columns. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2691) AlterTable transactions should anchor their ops in the WAL
[ https://issues.apache.org/jira/browse/KUDU-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2691: - Component/s: consensus > AlterTable transactions should anchor their ops in the WAL > -- > > Key: KUDU-2691 > URL: https://issues.apache.org/jira/browse/KUDU-2691 > Project: Kudu > Issue Type: Bug > Components: consensus, log, tablet >Affects Versions: 1.9.0 >Reporter: Mike Percy >Priority: Major > > AlterTable does not appear to anchor its WAL ops, meaning there is nothing > preventing Kudu from GCing a WAL segment including an AlterTable that is > running very slowly for some reason. If that happens and then the tserver is > killed, it's possible for that replica to fail to start back up later. We > should anchor alter ops in the same way we anchor write operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2691) AlterTable transactions should anchor their ops in the WAL
[ https://issues.apache.org/jira/browse/KUDU-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2691: - Component/s: tablet > AlterTable transactions should anchor their ops in the WAL > -- > > Key: KUDU-2691 > URL: https://issues.apache.org/jira/browse/KUDU-2691 > Project: Kudu > Issue Type: Bug > Components: log, tablet >Affects Versions: 1.9.0 >Reporter: Mike Percy >Priority: Major > > AlterTable does not appear to anchor its WAL ops, meaning there is nothing > preventing Kudu from GCing a WAL segment including an AlterTable that is > running very slowly for some reason. If that happens and then the tserver is > killed, it's possible for that replica to fail to start back up later. We > should anchor alter ops in the same way we anchor write operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2691) AlterTable transactions should anchor their ops in the WAL
Mike Percy created KUDU-2691: Summary: AlterTable transactions should anchor their ops in the WAL Key: KUDU-2691 URL: https://issues.apache.org/jira/browse/KUDU-2691 Project: Kudu Issue Type: Bug Components: log Affects Versions: 1.9.0 Reporter: Mike Percy AlterTable does not appear to anchor its WAL ops, meaning there is nothing preventing Kudu from GCing a WAL segment including an AlterTable that is running very slowly for some reason. If that happens and then the tserver is killed, it's possible for that replica to fail to start back up later. We should anchor alter ops in the same way we anchor write operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2676) Restore: Support creating tables with greater than the maximum allowed number of partitions
[ https://issues.apache.org/jira/browse/KUDU-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2676: - Summary: Restore: Support creating tables with greater than the maximum allowed number of partitions (was: [Backup] Support restoring tables over the maximum allowed replicas) > Restore: Support creating tables with greater than the maximum allowed number > of partitions > --- > > Key: KUDU-2676 > URL: https://issues.apache.org/jira/browse/KUDU-2676 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Grant Henke >Priority: Major > Labels: backup > > Currently it is possible to backup a table that has more partitions than are > allowed at create time. > This results in the restore job failing with the following exception: > {noformat} > 19/01/24 08:17:14 INFO backup.KuduRestore$: Restoring from path: > hdfs:///user/ghenke/kudu-backup-tests/20190124-080741 > Exception in thread "main" org.apache.kudu.client.NonRecoverableException: > the requested number of tablet replicas is over the maximum permitted at > creation time ( > 450), additional tablets may be added by adding range partitions to the table > post-creation > at > org.apache.kudu.client.KuduException.transformException(KuduException.java:110) > at > org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:365) > at org.apache.kudu.client.KuduClient.createTable(KuduClient.java:109) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2665) BlockManagerStressTest.StressTest is extremely flaky
Mike Percy created KUDU-2665: Summary: BlockManagerStressTest.StressTest is extremely flaky Key: KUDU-2665 URL: https://issues.apache.org/jira/browse/KUDU-2665 Project: Kudu Issue Type: New Feature Components: fs Reporter: Mike Percy After some recent block manager changes the Block Manager Stress Test is about 50% flaky on certain precommit builds. The failure looks like this: {code:java} /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/fs/block_manager-stress-test.cc:518: Failure Failed Bad status: Not found: /data/somelongdirectorytoavoidrpathissues/src/kudutest/block_manager-stress-test.0.BlockManagerStressTest_1.StressTest.1547778831841692-23619/data/e8ab31ef3e2143a5bc6d7a2b40e7805b.data: No such file or directory (error 2) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/fs/block_manager-stress-test.cc:549: Failure Expected: this->InjectNonFatalInconsistencies() doesn't generate new fatal failures in the current thread. Actual: it does. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2195) Enforce durability happened before relationships on multiple disks
[ https://issues.apache.org/jira/browse/KUDU-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737707#comment-16737707 ] Mike Percy commented on KUDU-2195: -- Here is the aforementioned band-aid patch for review: https://gerrit.cloudera.org/c/12186/ > Enforce durability happened before relationships on multiple disks > -- > > Key: KUDU-2195 > URL: https://issues.apache.org/jira/browse/KUDU-2195 > Project: Kudu > Issue Type: Bug > Components: consensus, tablet >Reporter: David Alves >Priority: Major > > When using weaker durability semantics (e.g. when log_force_fsync is off) we > should still enforce certain happened before relationships which are not > currently being enforced when using different disks for the wal and data. > The two cases that come to mind where this is relevant are: > 1) cmeta (c) -> wal (w) : We flush cmeta before flushing the wal (for > instance on term change) with the intention that either {}, \{c} or \{c, w} > were made durable. > 2) wal (w) -> tablet meta (t): We flush the wal before tablet metadata to > make sure that that all commit messages that refer to on disk row sets (and > deltas) are on disk before the row sets they point to, i.e. with the > intention that either {}, \{w} or \{w, t} were made durable. > With strong durability semantics these are always made durable in the right > order. With weaker semantics that is not the case though. If using the same > disk for both the wal and data then the invariants are still preserved, as > buffers will be flushed in the right order but if using different disks for > the wal and data (and because cmeta is stored with the data) that is not > always the case. > 1) in ext4 is actually safe, because we perform an fsync (indirect, rename() > implies fsync in ext4) when flushing cmeta. But it is not for xfs. > 2) Is not safe in either filesystem. > --- Possible solutions -- > For 1): Store cmeta with the wal; actually always fsync cmeta. > For 2): Store tablet meta with the wal; always fsync the wal before flushing > tablet meta. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2195) Enforce durability happened before relationships on multiple disks
[ https://issues.apache.org/jira/browse/KUDU-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737688#comment-16737688 ] Mike Percy commented on KUDU-2195: -- This was recently seen in the wild again. Usually it's people running XFS that experience a power outage who see 0-length cmeta files. We should consider adding a gflag for just the cmeta files so that people running with XFS have a band-aid. > Enforce durability happened before relationships on multiple disks > -- > > Key: KUDU-2195 > URL: https://issues.apache.org/jira/browse/KUDU-2195 > Project: Kudu > Issue Type: Bug > Components: consensus, tablet >Reporter: David Alves >Priority: Major > > When using weaker durability semantics (e.g. when log_force_fsync is off) we > should still enforce certain happened before relationships which are not > currently being enforced when using different disks for the wal and data. > The two cases that come to mind where this is relevant are: > 1) cmeta (c) -> wal (w) : We flush cmeta before flushing the wal (for > instance on term change) with the intention that either {}, \{c} or \{c, w} > were made durable. > 2) wal (w) -> tablet meta (t): We flush the wal before tablet metadata to > make sure that that all commit messages that refer to on disk row sets (and > deltas) are on disk before the row sets they point to, i.e. with the > intention that either {}, \{w} or \{w, t} were made durable. > With strong durability semantics these are always made durable in the right > order. With weaker semantics that is not the case though. If using the same > disk for both the wal and data then the invariants are still preserved, as > buffers will be flushed in the right order but if using different disks for > the wal and data (and because cmeta is stored with the data) that is not > always the case. > 1) in ext4 is actually safe, because we perform an fsync (indirect, rename() > implies fsync in ext4) when flushing cmeta. But it is not for xfs. > 2) Is not safe in either filesystem. > --- Possible solutions -- > For 1): Store cmeta with the wal; actually always fsync cmeta. > For 2): Store tablet meta with the wal; always fsync the wal before flushing > tablet meta. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2652) TsRecoveryITest.TestNoBlockIDReuseIfMissingBlocks potentially flaky
Mike Percy created KUDU-2652: Summary: TsRecoveryITest.TestNoBlockIDReuseIfMissingBlocks potentially flaky Key: KUDU-2652 URL: https://issues.apache.org/jira/browse/KUDU-2652 Project: Kudu Issue Type: New Feature Reporter: Mike Percy Attachments: ts_recovery-itest.txt.gz This test failed for me in a Gerrit pre-commit run with an unrelated change @ [http://jenkins.kudu.apache.org/job/kudu-gerrit/15885] The error was: {code:java} /home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/ts_recovery-itest.cc:298: Failure Value of: !orphaned_block_ids.empty() Actual: false Expected: true /home/jenkins-slave/workspace/kudu-master/3/src/kudu/util/test_util.cc:323: Failure Failed Timed out waiting for assertion to pass. {code} I am attaching the error log. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2645) Diff scanner should perform a merge on the rowset iterators at scan time
Mike Percy created KUDU-2645: Summary: Diff scanner should perform a merge on the rowset iterators at scan time Key: KUDU-2645 URL: https://issues.apache.org/jira/browse/KUDU-2645 Project: Kudu Issue Type: New Feature Components: tablet Reporter: Mike Percy In order to perform a diff scan we will need the MergeIterator to ensure that duplicate ghost rows are not returned in cases where a row was deleted and flushed, then reinserted into a new rowset during the time period covered by the diff scan. In such a case, only one representation of the row should be returned, which is the reinserted one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KUDU-2645) Diff scanner should perform a merge on the rowset iterators at scan time
[ https://issues.apache.org/jira/browse/KUDU-2645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy reassigned KUDU-2645: Assignee: Mike Percy > Diff scanner should perform a merge on the rowset iterators at scan time > > > Key: KUDU-2645 > URL: https://issues.apache.org/jira/browse/KUDU-2645 > Project: Kudu > Issue Type: New Feature > Components: tablet >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > > In order to perform a diff scan we will need the MergeIterator to ensure that > duplicate ghost rows are not returned in cases where a row was deleted and > flushed, then reinserted into a new rowset during the time period covered by > the diff scan. In such a case, only one representation of the row should be > returned, which is the reinserted one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE
[ https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723538#comment-16723538 ] Mike Percy commented on KUDU-1563: -- I know I'm late to this party. I think it's worth modeling what SQL does and INSERT IGNORE in that context operates at a batch or operation level, not a session level. So it seems more of an impedance match to keep this type of error handling configuration at the operation or batch level from a client API perspective to avoid requiring SQL clients to constantly be setting session options if they are caching sessions. > Add support for INSERT IGNORE > - > > Key: KUDU-1563 > URL: https://issues.apache.org/jira/browse/KUDU-1563 > Project: Kudu > Issue Type: New Feature >Reporter: Dan Burkert >Assignee: Brock Noland >Priority: Major > Labels: newbie > > The Java client currently has an [option to ignore duplicate row key errors| > https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-], > which is implemented by filtering the errors on the client side. If we are > going to continue to support this feature (and the consensus seems to be that > we probably should), we should promote it to a first class operation type > that is handled on the server side. This would have a modest perf. > improvement since less errors are returned, and it would allow INSERT IGNORE > ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-1575) Backup and restore procedures
[ https://issues.apache.org/jira/browse/KUDU-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720484#comment-16720484 ] Mike Percy commented on KUDU-1575: -- Hey Tim latest progress on this is we have some of the low-level work done but still working on finishing up the ability to do diff scans which are the basis for incremental backups. Once we finish that there is quite a bit of work left to implement restore of incremental backups, plus a lot of testing to ensure perf / scale / stability are all acceptable. No commitment on timeline but I am hoping a basic version of backup makes it out in the next release or two of Kudu. > Backup and restore procedures > - > > Key: KUDU-1575 > URL: https://issues.apache.org/jira/browse/KUDU-1575 > Project: Kudu > Issue Type: Improvement > Components: master, tserver >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > > Kudu needs backup and restore procedures, both for data and for metadata. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2629) TestHybridTime is flaky
[ https://issues.apache.org/jira/browse/KUDU-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703876#comment-16703876 ] Mike Percy commented on KUDU-2629: -- Saw the same error in a test run today. The error message was: {code:java} java.lang.AssertionError: expected:<4> but was:<3> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.kudu.client.TestHybridTime.test(TestHybridTime.java:167) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:745){code} > TestHybridTime is flaky > --- > > Key: KUDU-2629 > URL: https://issues.apache.org/jira/browse/KUDU-2629 > Project: Kudu > Issue Type: Bug > Components: java, test >Reporter: Andrew Wong >Priority: Major > Attachments: TEST-org.apache.kudu.client.TestHybridTime.xml > > > I saw three back-to-back failures of TestHybridTime in which a scan returned > an unexpected number of rows. I've attached the XML for the test and its > retries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KUDU-2402) Kudu Gerrit Sign-in link broken with Gerrit New UI
[ https://issues.apache.org/jira/browse/KUDU-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy reassigned KUDU-2402: Assignee: Mike Percy > Kudu Gerrit Sign-in link broken with Gerrit New UI > -- > > Key: KUDU-2402 > URL: https://issues.apache.org/jira/browse/KUDU-2402 > Project: Kudu > Issue Type: Bug > Components: project-infra >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > > Not sure if we need to upgrade the gerrit github plugin or what. The Sign In > link is broken after switching to the New UI in Gerrit. The URL I get is: > [https://gerrit.cloudera.org/login/%2Fq%2Fstatus%3Aopen] and that leads to a > 404 error. > Sign-in seems to work fine after switching back to the "Old UI" in Gerrit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KUDU-2402) Kudu Gerrit Sign-in link broken with Gerrit New UI
[ https://issues.apache.org/jira/browse/KUDU-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy resolved KUDU-2402. -- Resolution: Fixed Fix Version/s: n/a > Kudu Gerrit Sign-in link broken with Gerrit New UI > -- > > Key: KUDU-2402 > URL: https://issues.apache.org/jira/browse/KUDU-2402 > Project: Kudu > Issue Type: Bug > Components: project-infra >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Major > Fix For: n/a > > > Not sure if we need to upgrade the gerrit github plugin or what. The Sign In > link is broken after switching to the New UI in Gerrit. The URL I get is: > [https://gerrit.cloudera.org/login/%2Fq%2Fstatus%3Aopen] and that leads to a > 404 error. > Sign-in seems to work fine after switching back to the "Old UI" in Gerrit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2402) Kudu Gerrit Sign-in link broken with Gerrit New UI
[ https://issues.apache.org/jira/browse/KUDU-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680230#comment-16680230 ] Mike Percy commented on KUDU-2402: -- After a lot of time sunk into this I finally figured out the root cause with the help of Will Wilson from Cloudera IT was an HTTP -> HTTPS redirect problem. I eventually got to the bottom of the Apache httpd configuration that needed to be changed. This is now fixed. > Kudu Gerrit Sign-in link broken with Gerrit New UI > -- > > Key: KUDU-2402 > URL: https://issues.apache.org/jira/browse/KUDU-2402 > Project: Kudu > Issue Type: Bug > Components: project-infra >Reporter: Mike Percy >Priority: Major > > Not sure if we need to upgrade the gerrit github plugin or what. The Sign In > link is broken after switching to the New UI in Gerrit. The URL I get is: > [https://gerrit.cloudera.org/login/%2Fq%2Fstatus%3Aopen] and that leads to a > 404 error. > Sign-in seems to work fine after switching back to the "Old UI" in Gerrit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2614) Implement asynchronous replication
Mike Percy created KUDU-2614: Summary: Implement asynchronous replication Key: KUDU-2614 URL: https://issues.apache.org/jira/browse/KUDU-2614 Project: Kudu Issue Type: Task Reporter: Mike Percy Implement asynchronous cluster-to-cluster replication (across WAN links) for Kudu. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2612) Implement multi-row transactions
Mike Percy created KUDU-2612: Summary: Implement multi-row transactions Key: KUDU-2612 URL: https://issues.apache.org/jira/browse/KUDU-2612 Project: Kudu Issue Type: Task Reporter: Mike Percy Tracking Jira to implement multi-row / multi-table transactions in Kudu. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2613) Implement secondary indexes
Mike Percy created KUDU-2613: Summary: Implement secondary indexes Key: KUDU-2613 URL: https://issues.apache.org/jira/browse/KUDU-2613 Project: Kudu Issue Type: Task Reporter: Mike Percy Tracking Jira to implement secondary indexes in Kudu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2402) Kudu Gerrit Sign-in link broken with Gerrit New UI
[ https://issues.apache.org/jira/browse/KUDU-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636249#comment-16636249 ] Mike Percy commented on KUDU-2402: -- It looks like Chromium Issue #8373 was fixed in 2.14.7 according to the release notes @ [https://www.gerritcodereview.com/2.14.html#2147] so that should be the minimum version we upgrade to. > Kudu Gerrit Sign-in link broken with Gerrit New UI > -- > > Key: KUDU-2402 > URL: https://issues.apache.org/jira/browse/KUDU-2402 > Project: Kudu > Issue Type: Bug > Components: project-infra >Reporter: Mike Percy >Priority: Major > > Not sure if we need to upgrade the gerrit github plugin or what. The Sign In > link is broken after switching to the New UI in Gerrit. The URL I get is: > [https://gerrit.cloudera.org/login/%2Fq%2Fstatus%3Aopen] and that leads to a > 404 error. > Sign-in seems to work fine after switching back to the "Old UI" in Gerrit. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-1521) Flakiness in TestAsyncKuduSession
[ https://issues.apache.org/jira/browse/KUDU-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627870#comment-16627870 ] Mike Percy commented on KUDU-1521: -- I also observed a case where it failed the part of the test where it expected the PleaseThrottleException but it never appeared: {code:java} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.kudu.client.TestAsyncKuduSession.test(TestAsyncKuduSession.java:452) {code} Sounds like the same issue. > Flakiness in TestAsyncKuduSession > - > > Key: KUDU-1521 > URL: https://issues.apache.org/jira/browse/KUDU-1521 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 0.9.1 >Reporter: Adar Dembo >Assignee: Todd Lipcon >Priority: Major > Attachments: > org.apache.kudu.client.TestAsyncKuduSession-TableIsDeleted-output.txt, > org.apache.kudu.client.TestAsyncKuduSession-output.txt, > org.apache.kudu.client.TestAsyncKuduSession.test.log.xz > > > I've been trying to parse the various failures in > http://104.196.14.100/job/kudu-gerrit/2270/BUILD_TYPE=RELEASE. Here's what I > see in the test: > The way test() tests AUTO_FLUSH_BACKGROUND is inherently flaky; a delay while > running test code will give the background flush task a chance to fire when > the test code doesn't expect it. I've seen this cause lead to no > PleaseThrottleException, but I suspect the first block of test code dealing > with background flushes is flaky too (since it's testing elapsed time). > There's also some test failures that I can't figure out. I've pasted them > below for posterity: > {noformat} > 03:52:14 > testGetTableLocationsErrorCauseSessionStuck(org.kududb.client.TestAsyncKuduSession) > Time elapsed: 100.009 sec <<< ERROR! > 03:52:14 java.lang.Exception: test timed out after 10 milliseconds > 03:52:14 at java.lang.Object.wait(Native Method) > 03:52:14 at java.lang.Object.wait(Object.java:503) > 03:52:14 at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1136) > 03:52:14 at com.stumbleupon.async.Deferred.join(Deferred.java:1019) > 03:52:14 at > org.kududb.client.TestAsyncKuduSession.testGetTableLocationsErrorCauseSessionStuck(TestAsyncKuduSession.java:133) > 03:52:14 > 03:52:14 > testBatchErrorCauseSessionStuck(org.kududb.client.TestAsyncKuduSession) Time > elapsed: 0.199 sec <<< ERROR! > 03:52:14 org.kududb.client.MasterErrorException: Server[Kudu Master - > 127.13.215.1:64030] NOT_FOUND[code 1]: The table was deleted: Table deleted > at 2016-07-09 03:50:24 UTC > 03:52:14 at > org.kududb.client.TabletClient.dispatchMasterErrorOrReturnException(TabletClient.java:533) > 03:52:14 at org.kududb.client.TabletClient.decode(TabletClient.java:463) > 03:52:14 at org.kududb.client.TabletClient.decode(TabletClient.java:83) > 03:52:14 at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) > 03:52:14 at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > 03:52:14 at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > 03:52:14 at > org.kududb.client.TabletClient.handleUpstream(TabletClient.java:638) > 03:52:14 at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > 03:52:14 at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > 03:52:14 at > org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184) > 03:52:14 at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > 03:52:14 at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > 03:52:14 at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > 03:52:14 at > org.kududb.client.AsyncKuduClient$TabletClientPipeline.sendUpstream(AsyncKuduClient.java:1877) > 03:52:14 at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > 03:52:14 at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > 03:52:14 at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > 03:52:14 at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) > 03:52:14 at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) > 03:52:14 at > org.jboss.
[jira] [Commented] (KUDU-2219) org.apache.kudu.client.TestKuduClient.testCloseShortlyAfterOpen flaky
[ https://issues.apache.org/jira/browse/KUDU-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627836#comment-16627836 ] Mike Percy commented on KUDU-2219: -- I traced this down and it turns out that this races with Master leader election. If the Master leader election is a little slow then KuduClient.exportAuthenticationCredentials() will throw a NoLeaderFoundException after trying once on each Master server. Instead, it should sleep and retry until it hits a timeout. That issue is filed as KUDU-2387. > org.apache.kudu.client.TestKuduClient.testCloseShortlyAfterOpen flaky > - > > Key: KUDU-2219 > URL: https://issues.apache.org/jira/browse/KUDU-2219 > Project: Kudu > Issue Type: Bug > Components: java >Affects Versions: 1.6.0 >Reporter: Todd Lipcon >Assignee: Andrew Wong >Priority: Major > Attachments: > org.apache.kudu.client.TestKuduClient.testCloseShortlyAfterOpen.log.xz > > > This test has an assertion that no exceptions get logged, but it seems to > fail sometiimes with an IllegalStateException in the log: > {code} > ERROR - [peer master-127.62.82.1:64034] unexpected exception from downstream > on [id: 0xc4472f9d, /127.62.82.1:58372 :> /127.62.82.1:64034] > java.lang.IllegalStateException > at > com.google.common.base.Preconditions.checkState(Preconditions.java:429) > at > org.apache.kudu.client.Connection.messageReceived(Connection.java:264) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at org.apache.kudu.client.Connection.handleUpstream(Connection.java:236) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:68) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:291) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at org.apache.kudu.client.Negotiator.finish(Negotiator.java:653) > at > org.apache.kudu.client.Negotiator.handleSuccessResponse(Negotiator.java:641) > at > org.apache.kudu.client.Negotiator.handleSaslMessage(Negotiator.java:278) > at org.apache.kudu.client.Negotiator.handleResponse(Negotiator.java:258) > at > org.apache.kudu.client.Negotiator.messageReceived(Negotiator.java:231) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70) > at > org.jb
[jira] [Created] (KUDU-2584) Flaky testSimpleBackupAndRestore
Mike Percy created KUDU-2584: Summary: Flaky testSimpleBackupAndRestore Key: KUDU-2584 URL: https://issues.apache.org/jira/browse/KUDU-2584 Project: Kudu Issue Type: Bug Components: backup Reporter: Mike Percy testSimpleBackupAndRestore is flaky and tends to fail with the following error: {code:java} 04:48:06.604 [ERROR - Test worker] (RetryRule.java:72) testRandomBackupAndRestore(org.apache.kudu.backup.TestKuduBackup): failed run 1 java.lang.AssertionError: expected:<111> but was:<110> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.kudu.backup.TestKuduBackup.testRandomBackupAndRestore(TestKuduBackup.scala:99) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.apache.kudu.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:68) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137) at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63) at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(
[jira] [Updated] (KUDU-2583) LeakSanitizer failure in kudu-admin-test
[ https://issues.apache.org/jira/browse/KUDU-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2583: - Issue Type: Bug (was: Improvement) > LeakSanitizer failure in kudu-admin-test > > > Key: KUDU-2583 > URL: https://issues.apache.org/jira/browse/KUDU-2583 > Project: Kudu > Issue Type: Bug >Reporter: Mike Percy >Priority: Major > > Saw this error in an automated test run from kudu-admin-test in > DDLDuringRebalancingTest.TablesCreatedAndDeletedDuringRebalancing/0: > {code:java} > ==27773==ERROR: LeakSanitizer: detected memory leaks > Direct leak of 50 byte(s) in 1 object(s) allocated from: > #0 0x531928 in operator new(unsigned long) > /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:92 > > #1 0x377b29c3c8 in std::string::_Rep::_S_create(unsigned long, unsigned long, > std::allocator const&) (/usr/lib64/libstdc++.so.6+0x377b29c3c8) > Direct leak of 40 byte(s) in 1 object(s) allocated from: > #0 0x531928 in operator new(unsigned long) > /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:92 > > #1 0x7fe3255f5ccf in > _ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN4kudu15ClosureRunnableESaIS5_EJNS4_8CallbackIFvvEESt19_Sp_make_shared_tagPT_RKT0_DpOT1_ > ../../../include/c++/4.9.2/bits/shared_ptr_base.h:616:25 > #2 0x7fe3255f5b7a in > _ZNSt12__shared_ptrIN4kudu15ClosureRunnableELN9__gnu_cxx12_Lock_policyE2EEC2ISaIS1_EJNS0_8CallbackIFvvEESt19_Sp_make_shared_tagRKT_DpOT0_ > ../../../include/c++/4.9.2/bits/shared_ptr_base.h:1089:14 > #3 0x7fe3255f5a5f in > _ZSt15allocate_sharedIN4kudu15ClosureRunnableESaIS1_EJNS0_8CallbackIFvvESt10shared_ptrIT_ERKT0_DpOT1_ > ../../../include/c++/4.9.2/bits/shared_ptr.h:587:14 > #4 0x7fe3255ed9c0 in > _ZSt11make_sharedIN4kudu15ClosureRunnableEJNS0_8CallbackIFvvESt10shared_ptrIT_EDpOT0_ > ../../../include/c++/4.9.2/bits/shared_ptr.h:603:14 > #5 0x7fe3255ea383 in kudu::ThreadPool::SubmitClosure(kudu::Callback ()()>) > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:450:17 > > #6 0x7fe32e4a42ff in kudu::log::Log::AppendThread::Wake() > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/log.cc:289:5 > > #7 0x7fe32e4af94f in > kudu::log::Log::AsyncAppend(std::unique_ptr std::default_delete >, kudu::Callback ()(kudu::Status const&)> const&) > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/log.cc:602:19 > > #8 0x7fe32e4affbf in > kudu::log::Log::AsyncAppendReplicates(std::vector, > std::allocator > > > const&, kudu::Callback const&) > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/log.cc:614:10 > > #9 0x7fe32eb67994 in > kudu::consensus::LogCache::AppendOperations(std::vector, > std::allocator > > > const&, kudu::Callback const&) > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/log_cache.cc:213:29 > > #10 0x7fe32eb0b99e in > kudu::consensus::PeerMessageQueue::AppendOperations(std::vector, > std::allocator > > > const&, kudu::Callback const&) > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_queue.cc:403:3 > > #11 0x7fe32ebc8df0 in > kudu::consensus::RaftConsensus::UpdateReplica(kudu::consensus::ConsensusRequestPB > const*, kudu::consensus::ConsensusResponsePB*) > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/raft_consensus.cc:1451:7 > > #12 0x7fe32ebc52bf in > kudu::consensus::RaftConsensus::Update(kudu::consensus::ConsensusRequestPB > const*, kudu::consensus::ConsensusResponsePB*) > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/raft_consensus.cc:914:14 > > #13 0x7fe331bbb369 in > kudu::tserver::ConsensusServiceImpl::UpdateConsensus(kudu::consensus::ConsensusRequestPB > const*, kudu::consensus::ConsensusResponsePB*, kudu::rpc::RpcContext*) > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tserver/tablet_service.cc:946:25 > > #14 0x7fe3293f5cb9 in std::_Function_handler ()(google::protobuf::Message const*, google::protobuf::Message*, > kudu::rpc::RpcContext*), > kudu::consensus::ConsensusServiceIf::ConsensusServiceIf(scoped_refptr > const&, scoped_refptr > const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message > const*, google::protobuf::Message*, kudu::rpc::RpcContext*) > ../../../include/c++/4.9.2/functional:2039:2 > #15 0x7fe32841e2fb in std::function google::protobuf::Message*, > kudu::rpc::RpcContext*)>::operator()(google::protobuf::Message const*, > google::protobuf::Message*, kudu::rpc::RpcContext*) const > ../../../include/c++/4.9.2/functional:2439:14 > #16 0x7fe32841cd6a in > kudu::rpc::Gene
[jira] [Created] (KUDU-2583) LeakSanitizer failure in kudu-admin-test
Mike Percy created KUDU-2583: Summary: LeakSanitizer failure in kudu-admin-test Key: KUDU-2583 URL: https://issues.apache.org/jira/browse/KUDU-2583 Project: Kudu Issue Type: Improvement Reporter: Mike Percy Saw this error in an automated test run from kudu-admin-test in DDLDuringRebalancingTest.TablesCreatedAndDeletedDuringRebalancing/0: {code:java} ==27773==ERROR: LeakSanitizer: detected memory leaks Direct leak of 50 byte(s) in 1 object(s) allocated from: #0 0x531928 in operator new(unsigned long) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:92 #1 0x377b29c3c8 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator const&) (/usr/lib64/libstdc++.so.6+0x377b29c3c8) Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x531928 in operator new(unsigned long) /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/src/llvm-6.0.0.src/projects/compiler-rt/lib/asan/asan_new_delete.cc:92 #1 0x7fe3255f5ccf in _ZNSt14__shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN4kudu15ClosureRunnableESaIS5_EJNS4_8CallbackIFvvEESt19_Sp_make_shared_tagPT_RKT0_DpOT1_ ../../../include/c++/4.9.2/bits/shared_ptr_base.h:616:25 #2 0x7fe3255f5b7a in _ZNSt12__shared_ptrIN4kudu15ClosureRunnableELN9__gnu_cxx12_Lock_policyE2EEC2ISaIS1_EJNS0_8CallbackIFvvEESt19_Sp_make_shared_tagRKT_DpOT0_ ../../../include/c++/4.9.2/bits/shared_ptr_base.h:1089:14 #3 0x7fe3255f5a5f in _ZSt15allocate_sharedIN4kudu15ClosureRunnableESaIS1_EJNS0_8CallbackIFvvESt10shared_ptrIT_ERKT0_DpOT1_ ../../../include/c++/4.9.2/bits/shared_ptr.h:587:14 #4 0x7fe3255ed9c0 in _ZSt11make_sharedIN4kudu15ClosureRunnableEJNS0_8CallbackIFvvESt10shared_ptrIT_EDpOT0_ ../../../include/c++/4.9.2/bits/shared_ptr.h:603:14 #5 0x7fe3255ea383 in kudu::ThreadPool::SubmitClosure(kudu::Callback) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/threadpool.cc:450:17 #6 0x7fe32e4a42ff in kudu::log::Log::AppendThread::Wake() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/log.cc:289:5 #7 0x7fe32e4af94f in kudu::log::Log::AsyncAppend(std::unique_ptr >, kudu::Callback const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/log.cc:602:19 #8 0x7fe32e4affbf in kudu::log::Log::AsyncAppendReplicates(std::vector, std::allocator > > const&, kudu::Callback const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/log.cc:614:10 #9 0x7fe32eb67994 in kudu::consensus::LogCache::AppendOperations(std::vector, std::allocator > > const&, kudu::Callback const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/log_cache.cc:213:29 #10 0x7fe32eb0b99e in kudu::consensus::PeerMessageQueue::AppendOperations(std::vector, std::allocator > > const&, kudu::Callback const&) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/consensus_queue.cc:403:3 #11 0x7fe32ebc8df0 in kudu::consensus::RaftConsensus::UpdateReplica(kudu::consensus::ConsensusRequestPB const*, kudu::consensus::ConsensusResponsePB*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/raft_consensus.cc:1451:7 #12 0x7fe32ebc52bf in kudu::consensus::RaftConsensus::Update(kudu::consensus::ConsensusRequestPB const*, kudu::consensus::ConsensusResponsePB*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/consensus/raft_consensus.cc:914:14 #13 0x7fe331bbb369 in kudu::tserver::ConsensusServiceImpl::UpdateConsensus(kudu::consensus::ConsensusRequestPB const*, kudu::consensus::ConsensusResponsePB*, kudu::rpc::RpcContext*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tserver/tablet_service.cc:946:25 #14 0x7fe3293f5cb9 in std::_Function_handler const&, scoped_refptr const&)::$_1>::_M_invoke(std::_Any_data const&, google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) ../../../include/c++/4.9.2/functional:2039:2 #15 0x7fe32841e2fb in std::function::operator()(google::protobuf::Message const*, google::protobuf::Message*, kudu::rpc::RpcContext*) const ../../../include/c++/4.9.2/functional:2439:14 #16 0x7fe32841cd6a in kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/rpc/service_if.cc:139:3 #17 0x7fe328420d87 in kudu::rpc::ServicePool::RunThread() /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/rpc/service_pool.cc:225:15 #18 0x7fe328426612 in boost::_bi::bind_t, boost::_bi::list1 > >::operator()() /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/uninstrumented/include/boost/bind/bind.hpp:1222:16 #19 0x7fe32837bf1b in boost::function0::operator()() const /data/somelongdirectorytoavoidrpathissues/src/kudu/thirdparty/installed/uninstrumented/include/bo
[jira] [Resolved] (KUDU-2559) kudu-tool-test TestLoadgenDatabaseName fails with a memory leak
[ https://issues.apache.org/jira/browse/KUDU-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy resolved KUDU-2559. -- Resolution: Cannot Reproduce Fix Version/s: n/a Resolving this as cannot reproduce because there is no log file attached; Please reopen if you have the file! > kudu-tool-test TestLoadgenDatabaseName fails with a memory leak > --- > > Key: KUDU-2559 > URL: https://issues.apache.org/jira/browse/KUDU-2559 > Project: Kudu > Issue Type: Bug > Components: ksck >Reporter: Andrew Wong >Priority: Major > Fix For: n/a > > Attachments: kudu-tool-test.2.xml > > > I've attached a log with the LeakSanitizer error, though looking at the test > itself and the error, it isn't clear to me why the issue would be specific to > this test. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2562) Checkpoint highest legal timestamp in tablet superblock when tablet history GC deletes data
[ https://issues.apache.org/jira/browse/KUDU-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2562: - Description: Checkpoint the highest legal timestamp in the tablet superblock when tablet history GC deletes data so that increasing the AHM age doesn’t expose us to inconsistent scans after a GC. This is a real edge case and is a temporary condition depending on users restarting with a changed configuration flag. However without this safety feature, users can get bad scans if they increase the {{[--tablet_history_max_age_sec|http://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_tablet_history_max_age_sec]}} command-line flag after a GC operation runs. was: Checkpoint the highest legal timestamp in the tablet superblock when tablet history GC deletes data so that increasing the AHM age doesn’t expose us to inconsistent scans after a GC. This is a real edge case and is a temporary condition depending on users restarting with a changed configuration flag. However without this safety feature, users can get bad scans if they change the flag. > Checkpoint highest legal timestamp in tablet superblock when tablet history > GC deletes data > --- > > Key: KUDU-2562 > URL: https://issues.apache.org/jira/browse/KUDU-2562 > Project: Kudu > Issue Type: Improvement > Components: tablet >Affects Versions: 1.7.1 >Reporter: Mike Percy >Priority: Minor > > Checkpoint the highest legal timestamp in the tablet superblock when tablet > history GC deletes data so that increasing the AHM age doesn’t expose us to > inconsistent scans after a GC. > This is a real edge case and is a temporary condition depending on users > restarting with a changed configuration flag. However without this safety > feature, users can get bad scans if they increase the > {{[--tablet_history_max_age_sec|http://kudu.apache.org/docs/configuration_reference.html#kudu-tserver_tablet_history_max_age_sec]}} > command-line flag after a GC operation runs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2577) Support rebalancing data allocation across directories when adding a new data dir
Mike Percy created KUDU-2577: Summary: Support rebalancing data allocation across directories when adding a new data dir Key: KUDU-2577 URL: https://issues.apache.org/jira/browse/KUDU-2577 Project: Kudu Issue Type: Improvement Components: ops-tooling, tablet Affects Versions: 1.7.0 Reporter: Mike Percy I got a request for a tool to rebalance data usage across a single server's data directories when adding a data dir. There is no such tool, but I wanted to document that request because it's a reasonable feature to have. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-686) Delta apply optimizations
[ https://issues.apache.org/jira/browse/KUDU-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612466#comment-16612466 ] Mike Percy commented on KUDU-686: - [~adar], would you mind elaborating more on the new approach, what it solves, and how it does it? > Delta apply optimizations > - > > Key: KUDU-686 > URL: https://issues.apache.org/jira/browse/KUDU-686 > Project: Kudu > Issue Type: Improvement > Components: perf, tablet >Affects Versions: M4.5 >Reporter: David Alves >Assignee: Adar Dembo >Priority: Trivial > > We currently iterate on each delta file several times, one for deletes and > then one for each one of the columns. > It seems that, when selecting all the columns it would be more efficient to > apply the deltas to all columns at the same time. This might or might not be > advantageous depending on the number of columns projected. Todd also suggest > that whether this is an advantage also depends on whether there are > predicates being pushed down. > We could likely also merge the updates and deletes into a single iteration or > at least avoid applying the mutations if the row will end up delete (right > now we still apply the updates even when we find that the row will be > deleted). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2563) Spark integration should implement scanner keep-alive API
[ https://issues.apache.org/jira/browse/KUDU-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610906#comment-16610906 ] Mike Percy commented on KUDU-2563: -- It appears that we'll have to expose scanner keepalive to the Java API before implementing Spark support, because the only place I see the keepalive API defined is in the C++ client KuduScanner class @ http://kudu.apache.org/releases/1.7.1/cpp-client-api/classkudu_1_1client_1_1KuduScanner.html#aa4a0caf7142880255d7aac1d75f33d21 > Spark integration should implement scanner keep-alive API > - > > Key: KUDU-2563 > URL: https://issues.apache.org/jira/browse/KUDU-2563 > Project: Kudu > Issue Type: Improvement > Components: client, spark >Affects Versions: 1.7.1 >Reporter: Mike Percy >Assignee: Grant Henke >Priority: Major > > The Spark integration should implement the scanner keep-alive API like the > Impala scanner does in order to avoid errors related to scanners timing out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2563) Spark integration should implement scanner keep-alive API
Mike Percy created KUDU-2563: Summary: Spark integration should implement scanner keep-alive API Key: KUDU-2563 URL: https://issues.apache.org/jira/browse/KUDU-2563 Project: Kudu Issue Type: Improvement Components: client, spark Affects Versions: 1.7.1 Reporter: Mike Percy The Spark integration should implement the scanner keep-alive API like the Impala scanner does in order to avoid errors related to scanners timing out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2562) Checkpoint highest legal timestamp in tablet superblock when tablet history GC deletes data
Mike Percy created KUDU-2562: Summary: Checkpoint highest legal timestamp in tablet superblock when tablet history GC deletes data Key: KUDU-2562 URL: https://issues.apache.org/jira/browse/KUDU-2562 Project: Kudu Issue Type: Improvement Components: tablet Affects Versions: 1.7.1 Reporter: Mike Percy Checkpoint the highest legal timestamp in the tablet superblock when tablet history GC deletes data so that increasing the AHM age doesn’t expose us to inconsistent scans after a GC. This is a real edge case and is a temporary condition depending on users restarting with a changed configuration flag. However without this safety feature, users can get bad scans if they change the flag. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2516) Add NOT EQUAL predicate type
Mike Percy created KUDU-2516: Summary: Add NOT EQUAL predicate type Key: KUDU-2516 URL: https://issues.apache.org/jira/browse/KUDU-2516 Project: Kudu Issue Type: Sub-task Components: cfile, perf Affects Versions: 1.7.1 Reporter: Mike Percy Kudu currently does not have support for a NOT_EQUAL predicate type. This is usually relevant when AND-ed together with other predicates. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2515) Implement Spark join optimization support
Mike Percy created KUDU-2515: Summary: Implement Spark join optimization support Key: KUDU-2515 URL: https://issues.apache.org/jira/browse/KUDU-2515 Project: Kudu Issue Type: Improvement Affects Versions: 1.7.1 Reporter: Mike Percy At the time of writing, Spark is not able to properly optimize joins on Kudu tables because Kudu does not provide statistics for Spark to use to determine the optimal join strategy. It would be a big improvement to find some way to help Spark optimize joins between Kudu tables or between Kudu tables and Parquet-on-HDFS tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2513) Fix Flume sink class names on Kudu Flume Sink blog post
[ https://issues.apache.org/jira/browse/KUDU-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2513: - Component/s: documentation > Fix Flume sink class names on Kudu Flume Sink blog post > --- > > Key: KUDU-2513 > URL: https://issues.apache.org/jira/browse/KUDU-2513 > Project: Kudu > Issue Type: Improvement > Components: documentation, flume-sink >Affects Versions: 1.7.1 >Reporter: Mike Percy >Priority: Major > Labels: blog, newbie > > The blog post for the Kudu Flume sink is the easiest documentation for using > it but the class names have changed since it was posted and it's out of date. > We should fix the examples. > https://kudu.apache.org/2016/08/31/intro-flume-kudu-sink.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2513) Fix Flume sink class names on Kudu Flume Sink blog post
[ https://issues.apache.org/jira/browse/KUDU-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2513: - Labels: blog newbie (was: ) > Fix Flume sink class names on Kudu Flume Sink blog post > --- > > Key: KUDU-2513 > URL: https://issues.apache.org/jira/browse/KUDU-2513 > Project: Kudu > Issue Type: Improvement > Components: flume-sink >Affects Versions: 1.7.1 >Reporter: Mike Percy >Priority: Major > Labels: blog, newbie > > The blog post for the Kudu Flume sink is the easiest documentation for using > it but the class names have changed since it was posted and it's out of date. > We should fix the examples. > https://kudu.apache.org/2016/08/31/intro-flume-kudu-sink.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2513) Fix Flume sink class names on Kudu Flume Sink blog post
Mike Percy created KUDU-2513: Summary: Fix Flume sink class names on Kudu Flume Sink blog post Key: KUDU-2513 URL: https://issues.apache.org/jira/browse/KUDU-2513 Project: Kudu Issue Type: Improvement Components: flume-sink Affects Versions: 1.7.1 Reporter: Mike Percy The blog post for the Kudu Flume sink is the easiest documentation for using it but the class names have changed since it was posted and it's out of date. We should fix the examples. https://kudu.apache.org/2016/08/31/intro-flume-kudu-sink.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2411) Create a public test utility artifact
[ https://issues.apache.org/jira/browse/KUDU-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554510#comment-16554510 ] Mike Percy commented on KUDU-2411: -- Below is a link to a bare-bones binary test artifact that I built on CentOS 6 and was able to run --help on on Ubuntu 16.04. It's not a release version (1.8.0-SNAPSHOT), it contains snapshots of all security libs and should never be used "in production", and like I said is not really tested yet. I think it probably works, though. https://drive.google.com/file/d/187tpUZJP-SiMsMVbj-9FcATUbQXmuUiy/ > Create a public test utility artifact > - > > Key: KUDU-2411 > URL: https://issues.apache.org/jira/browse/KUDU-2411 > Project: Kudu > Issue Type: Improvement > Components: java >Affects Versions: 1.7.0 >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Labels: community > > Create a public published test utility jar that contains useful testing > utilities for applications that integrate with Kudu including things like > BaseKuduTest.java and MiniKuduCluster.java. > This has the added benefit of eliminating the unusual dependency on all of > kudu-clients test in each of the other java modules. This could likely be > used in our examples code too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2411) Create a public test utility artifact
[ https://issues.apache.org/jira/browse/KUDU-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554496#comment-16554496 ] Mike Percy commented on KUDU-2411: -- Hi [~timrobertson100], I can upload an initial Linux version of the binary tarball somewhere for you to try out. Hopefully we are eventually going to get the relevant scripts merged into the Kudu main line so collaborating via Gerrit would be ideal because other Kudu devs will see the patches, but if you want to start off with a GH repo to minimize bootstrapping overhead before pushing patches to Gerrit then I'm open to that as well. > Create a public test utility artifact > - > > Key: KUDU-2411 > URL: https://issues.apache.org/jira/browse/KUDU-2411 > Project: Kudu > Issue Type: Improvement > Components: java >Affects Versions: 1.7.0 >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Labels: community > > Create a public published test utility jar that contains useful testing > utilities for applications that integrate with Kudu including things like > BaseKuduTest.java and MiniKuduCluster.java. > This has the added benefit of eliminating the unusual dependency on all of > kudu-clients test in each of the other java modules. This could likely be > used in our examples code too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2506) Improve and document docs push procedure
Mike Percy created KUDU-2506: Summary: Improve and document docs push procedure Key: KUDU-2506 URL: https://issues.apache.org/jira/browse/KUDU-2506 Project: Kudu Issue Type: Improvement Components: documentation Affects Versions: 1.7.1 Reporter: Mike Percy As of this writing, when we want to push docs for a release, the release docs will overwrite the existing "unversioned" docs. This is a problem for maintenance releases, such as releasing a 1.5.1 after a 1.6.0 release, since the unversioned 1.6.0 docs living at [http://kudu.apache.org/docs/] will be replaced with 1.5.1 docs. We should improve this process and the scripts that drive it. Potential improvements: * Add an option to the docs publish script to only update the versioned docs, i.e. --versioned-only * Separate out master vs versioned docs push into separate script invocations * Create a Jenkins job that can build and deploy docs to either /docs or a release docs location. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2505) Add menu for switching between docs versions to Kudu web site docs
Mike Percy created KUDU-2505: Summary: Add menu for switching between docs versions to Kudu web site docs Key: KUDU-2505 URL: https://issues.apache.org/jira/browse/KUDU-2505 Project: Kudu Issue Type: Improvement Components: documentation Affects Versions: 1.7.1 Reporter: Mike Percy It would be useful to have a "version switcher" widget on the Kudu documentation page that allowed people to navigate to another version of the docs from wherever they are, in case they land on the wrong version from a Google search. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2504) Add Kudu version number to header of docs pages
Mike Percy created KUDU-2504: Summary: Add Kudu version number to header of docs pages Key: KUDU-2504 URL: https://issues.apache.org/jira/browse/KUDU-2504 Project: Kudu Issue Type: Improvement Components: documentation Affects Versions: 1.7.1 Reporter: Mike Percy It is currently not easy to tell which version of the docs you are looking at when you are on the "unversioned" section of the Kudu docs @ [http://kudu.apache.org/docs/] – we should add a header or a little strip to the top of each page that says something like "you are looking at version 1.7.1 of the docs" or "you are looking at docs for version 1.8.0-SNAPSHOT generated from Git commit eee82d90a54108f2d7e18e84ec0bbd391fcc129a" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2411) Create a public test utility artifact
[ https://issues.apache.org/jira/browse/KUDU-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542306#comment-16542306 ] Mike Percy commented on KUDU-2411: -- I threw together a proof-of-concept of this today by building on EL6, copying all the deps (except for a few system libs like libpthread, libc, libdl, libgcc, etc), changing the rpath to point to the deps, and running it on Ubuntu 16.04. And I was able to run the binaries to the point that --help did not crash. I'm going to work on making generation of such an artifact a bit less hacky and try a more interesting test when I find a few spare minutes to work on it. > Create a public test utility artifact > - > > Key: KUDU-2411 > URL: https://issues.apache.org/jira/browse/KUDU-2411 > Project: Kudu > Issue Type: Improvement > Components: java >Affects Versions: 1.7.0 >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Labels: community > > Create a public published test utility jar that contains useful testing > utilities for applications that integrate with Kudu including things like > BaseKuduTest.java and MiniKuduCluster.java. > This has the added benefit of eliminating the unusual dependency on all of > kudu-clients test in each of the other java modules. This could likely be > used in our examples code too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2411) Create a public test utility artifact
[ https://issues.apache.org/jira/browse/KUDU-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2411: - Labels: community (was: ) > Create a public test utility artifact > - > > Key: KUDU-2411 > URL: https://issues.apache.org/jira/browse/KUDU-2411 > Project: Kudu > Issue Type: Improvement > Components: java >Affects Versions: 1.7.0 >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Labels: community > > Create a public published test utility jar that contains useful testing > utilities for applications that integrate with Kudu including things like > BaseKuduTest.java and MiniKuduCluster.java. > This has the added benefit of eliminating the unusual dependency on all of > kudu-clients test in each of the other java modules. This could likely be > used in our examples code too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2411) Create a public test utility artifact
[ https://issues.apache.org/jira/browse/KUDU-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2411: - Component/s: (was: community) > Create a public test utility artifact > - > > Key: KUDU-2411 > URL: https://issues.apache.org/jira/browse/KUDU-2411 > Project: Kudu > Issue Type: Improvement > Components: java >Affects Versions: 1.7.0 >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > > Create a public published test utility jar that contains useful testing > utilities for applications that integrate with Kudu including things like > BaseKuduTest.java and MiniKuduCluster.java. > This has the added benefit of eliminating the unusual dependency on all of > kudu-clients test in each of the other java modules. This could likely be > used in our examples code too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2411) Create a public test utility artifact
[ https://issues.apache.org/jira/browse/KUDU-2411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Percy updated KUDU-2411: - Component/s: community > Create a public test utility artifact > - > > Key: KUDU-2411 > URL: https://issues.apache.org/jira/browse/KUDU-2411 > Project: Kudu > Issue Type: Improvement > Components: community, java >Affects Versions: 1.7.0 >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > > Create a public published test utility jar that contains useful testing > utilities for applications that integrate with Kudu including things like > BaseKuduTest.java and MiniKuduCluster.java. > This has the added benefit of eliminating the unusual dependency on all of > kudu-clients test in each of the other java modules. This could likely be > used in our examples code too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2486) Leader should back off heartbeating to failed followers
Mike Percy created KUDU-2486: Summary: Leader should back off heartbeating to failed followers Key: KUDU-2486 URL: https://issues.apache.org/jira/browse/KUDU-2486 Project: Kudu Issue Type: Improvement Components: consensus Affects Versions: 1.7.1 Reporter: Mike Percy At the time of writing, the replica leader -> follower heartbeat mechanism does not have a backoff mechanism built in. Rather it simply sends a heartbeat every configured period (say, 500ms). If a server is offline this can cause log spam until that replica is evicted, and if a server is overloaded this lack of a backoff contributes to the problem. Since we now have pre-election support, having leaders slow down their heartbeat attempts when follower requests are returning errors should not cause unnecessary leader elections, so backing off is feasible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2484) ksck should show hostname in addition to "TS unavailable" when a host is down
Mike Percy created KUDU-2484: Summary: ksck should show hostname in addition to "TS unavailable" when a host is down Key: KUDU-2484 URL: https://issues.apache.org/jira/browse/KUDU-2484 Project: Kudu Issue Type: Improvement Components: ops-tooling Affects Versions: 1.6.0 Reporter: Mike Percy ksck should show hostname in addition to "TS unavailable" in the consensus matrix when a host is down so it's easier to troubleshoot consensus errors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (KUDU-2438) Class relocation in the maven build should be the same as in the gradle build
[ https://issues.apache.org/jira/browse/KUDU-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518339#comment-16518339 ] Mike Percy commented on KUDU-2438: -- This is good stuff, I think we should bring it forward at some point but there's no need for you to push it through. Thanks for the WIP Ferenc! > Class relocation in the maven build should be the same as in the gradle build > - > > Key: KUDU-2438 > URL: https://issues.apache.org/jira/browse/KUDU-2438 > Project: Kudu > Issue Type: Bug >Reporter: Ferenc Szabo >Assignee: Ferenc Szabo >Priority: Major > > The shaded jars from maven are referencing the original classes from guava > for example. > the maven-shade-plugin should be configured to relocate them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)