[kudu-CR] Implement BloomFilter Predicate in server side.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 ) Change subject: Implement BloomFilter Predicate in server side. .. Patch Set 9: (15 comments) http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc File src/kudu/common/column_predicate-test.cc: http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc@881 PS9, Line 881: InList Should this be 'Equality'? http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h File src/kudu/common/column_predicate.h: http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@230 PS9, Line 230: bool EvaluateCellForBloomFilter(DataType type, const void* cell) const; Is this used anywhere? If it's test only please indicate that in the doc. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@376 PS9, Line 376: typedef typename DataTypeTraits::cpp_type cpp_type; : size_t size = sizeof(cpp_type); : const void* data = cell; : if (PhysicalType == BINARY) { : const Slice *slice = reinterpret_cast(cell); : size = slice->size(); : data = slice->data(); : } : Slice cell_slice(reinterpret_cast(data), size); : BloomKeyProbe probe(cell_slice, bf.hash_algorithm()); This entire portion could be hoisted outside the for loop, which saves recomputing the probe in the case of multiple filters. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@386 PS9, Line 386: if (!BloomFilter(bf.bloom_data(), bf.nhash()).MayContainKey(probe)) { Once this for loop is simplified to just be this call, it should further simplify to just use std::all_of http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc File src/kudu/common/column_predicate.cc: http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@115 PS9, Line 115: return ColumnPredicate(PredicateType::InBloomFilter, move(column), bfs, lower, upper); I think this should be calling Simplify(), or if there's a reason not to please leave a comment. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@385 PS9, Line 385: bloom_filters_.insert(bloom_filters_.end(), other.bloom_filters().begin(), Pretty sure this can be simplified to a straight copy, eg: bloom_filters_ = other.bloom_filters_; http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@387 PS9, Line 387: if (other.lower_ != nullptr && This range portion of this code doesn't need to be duplicated if you move InBloomFilter above Range, and let it fall-through. Make sure to annotate with FALLTHROUGH_INTENDED, though. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@588 PS9, Line 588: if (new_values.empty()) { Simplify() should already handle this, so I think you can reduce this to just the else clause. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@644 PS9, Line 644: new_values.emplace_back(value); can this use copy_if like above? I think it's more elegant that way. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@648 PS9, Line 648: SetToNone(); likewise, I think this is already handled by the Simplify() call. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@661 PS9, Line 661: if (other.lower_ != nullptr && likewise I think this can be simplified with the fallthrough trick. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@895 PS9, Line 895: case PredicateType::InBloomFilter: { This could be simplified by moving it above Range and allowing it to fall through to check the range portion (after checking the annotations). Annotate with FALLTHROUGH_INTENDED if you do that. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@899 PS9, Line 899: auto bound_equal = [&] (const void* eleml, const void* elemr) { This is nice, consider doing the same simplification for Range. http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto File src/kudu/common/common.proto: http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto@347 PS9, Line 347: enum HashAlgorithmInBloomFilter { I suggested this back in PS5 but I think there was some confusion. I think you can call this just 'HashAlgorithm', and use it for both bloom filters as well as in HashBucketSchemaPB (where the existing one will be removed). To do that compatibly the existing HashAlgorithm values have to remain, so it should be: HashAlgorithm { UNKNOWN = 0; MURMUR_HASH_2 = 1; CITY_HASH =
[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11571 ) Change subject: KUDU-2563: [spark] Use the scanner keepAlive API .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/11571/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala: http://gerrit.cloudera.org:8080/#/c/11571/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@35 PS3, Line 35: > nit: remove. Done -- To view, visit http://gerrit.cloudera.org:8080/11571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Gerrit-Change-Number: 11571 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Fri, 05 Oct 2018 02:23:15 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11571 ) Change subject: KUDU-2563: [spark] Use the scanner keepAlive API .. KUDU-2563: [spark] Use the scanner keepAlive API Adds scheduled keepAlive calls to the scanner in the KuduRDD RowIterator. The period in which the calls are made is configurable via keepAlivePeriodMs and has a default of 15 seconds (which is 1/4 the default scanner ttl). This implementation is similar to the Impala integration. It checks if a call to the keepAlive API is needed as it processes each row. Compared to a background thread, this has the downside of being less consistently scheduled and susceptible to scenarios in which a single row takes longer to process than the ttl. However, because the scanner is not thread safe, this is the most straightforward solution and has been proven to work. Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Reviewed-on: http://gerrit.cloudera.org:8080/11571 Reviewed-by: Grant Henke Tested-by: Grant Henke --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala 6 files changed, 139 insertions(+), 11 deletions(-) Approvals: Grant Henke: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Gerrit-Change-Number: 11571 Gerrit-PatchSet: 5 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API
Hello Mike Percy, Dan Burkert, Kudu Jenkins, Adar Dembo, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11571 to look at the new patch set (#4). Change subject: KUDU-2563: [spark] Use the scanner keepAlive API .. KUDU-2563: [spark] Use the scanner keepAlive API Adds scheduled keepAlive calls to the scanner in the KuduRDD RowIterator. The period in which the calls are made is configurable via keepAlivePeriodMs and has a default of 15 seconds (which is 1/4 the default scanner ttl). This implementation is similar to the Impala integration. It checks if a call to the keepAlive API is needed as it processes each row. Compared to a background thread, this has the downside of being less consistently scheduled and susceptible to scenarios in which a single row takes longer to process than the ttl. However, because the scanner is not thread safe, this is the most straightforward solution and has been proven to work. Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala 6 files changed, 139 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/11571/4 -- To view, visit http://gerrit.cloudera.org:8080/11571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Gerrit-Change-Number: 11571 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11571 ) Change subject: KUDU-2563: [spark] Use the scanner keepAlive API .. Patch Set 4: Verified+1 Code-Review+2 Carrying the +2 forward through the doc nit change. -- To view, visit http://gerrit.cloudera.org:8080/11571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Gerrit-Change-Number: 11571 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Fri, 05 Oct 2018 02:23:36 + Gerrit-HasComments: No
[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11571 ) Change subject: KUDU-2563: [spark] Use the scanner keepAlive API .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/11571/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala: http://gerrit.cloudera.org:8080/#/c/11571/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@35 PS3, Line 35: . nit: remove. -- To view, visit http://gerrit.cloudera.org:8080/11571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Gerrit-Change-Number: 11571 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Fri, 05 Oct 2018 00:53:09 + Gerrit-HasComments: Yes
[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets
Hello Tidy Bot, Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Adar Dembo, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11554 to look at the new patch set (#11). Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets .. [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets ksck checksum scans allow the user to checksum with snapshot scans, so that a checksum can be done even as tablets are mutated. It also allows users to omit a snapshot timestamp. Previously, in this case, the snapshot timestamp would be retrieved from some healthy tablet server at the beginning of the checksum process, and used for every replica. This didn't work well for checksumming large tables, because eventually the snapshot timestamp fell before the ancient history mark, and subsequent checksums scans would not be accepted by the tablet servers. This changes how checksum scans work to address this problem: 1. A background process periodically updates timestamps from tablet servers. 2. The checksum process is reorganized so the replicas of one tablet are checksummed together. 3. When a tablet is about to be checksummed, and the checksum scan is a snapshot scan with no user-provided timestamp, the tablet is assigned an up-to-date timestamp from one of the tablet servers that hosts a replica. Every replica is then checksummed using this snapshot timestamp. 4. The original default timeout of 3600 seconds for a checksum scan is too low, but it didn't really matter because the default tablet history max age was 900 seconds. Now that checksum scans can continue for many hours, the default timeout is raised to 86400 seconds (1 day), and a new idle timeout is added. If a checksum process does not checksum an additional row for this idle timeout (default 10 minutes), it will idle time out. Note that there is a new scheduling problem given #2: each tablet server has a fixed number of slots for checksum scans, but every tablet server hosting a replica must have a slot available before any replica's checksum can start, so deciding in which order to checksum tablets and how to find which are available to schedule is important. Given that the bulk of the time in checksums is occupied waiting for tablet servers to read lots of data off disk, materialize it as rows, and checksum it, it's worth spending a lot of effort to make sure the cluster is fully utilized given the scan concurrency constraints. So, the tool uses a brute force approach and simply checks all tablets to see which can be checksummed, any time a replica checksum finishes and frees a slot. Tablets are considered in tablet id order. Since tablet ids are UUIDs, there should be no correlation between a tablet's id and how its replicas are distributed across tablet servers. There are several tests added: 1. For the KUDU-2179 fix itself. 2. For the idle timeout. 3. For when a checksum finds mismatches. Yes, we didn't have a test for this before. After adding this test I saw that the output is a little confusing since it reported the number of replicas with mismatches rather than the number of tablets, so I altered the output to fix that. 4. A couple of tests exercising situations when all tablet servers are unavailable and when all peers of a tablet are unavailable. I also checksummed a very large cluster with 500TB of data or so, across about 37000 replicas. The checksum scan completed successfully after more than 12 hours. Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_checksum.cc M src/kudu/tools/ksck_checksum.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/ksck_remote.h 7 files changed, 869 insertions(+), 276 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/11554/11 -- To view, visit http://gerrit.cloudera.org:8080/11554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 Gerrit-Change-Number: 11554 Gerrit-PatchSet: 11 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11571 ) Change subject: KUDU-2563: [spark] Use the scanner keepAlive API .. Patch Set 3: Code-Review+2 Would be nice to get another review though. -- To view, visit http://gerrit.cloudera.org:8080/11571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Gerrit-Change-Number: 11571 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Thu, 04 Oct 2018 23:57:28 + Gerrit-HasComments: No
[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11554 ) Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets .. Patch Set 9: (6 comments) http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc File src/kudu/tools/ksck-test.cc: http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc@21 PS10, Line 21: #include > Unused? Or is it? The 'timestamp_' member is an atomic. Also IWYU isn't complaining. http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc@1588 PS10, Line 1588: new_uuid, : std::make_shared(new_uuid)); > nit: add a space Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc File src/kudu/tools/ksck_checksum.cc: http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@298 PS7, Line 298: aut > Ah, thanks for doing that. I don't think I'm a fan of that; it's verbose an Done http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc File src/kudu/tools/ksck_checksum.cc: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@353 PS9, Line 353: : > nit: spacing Done http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@354 PS9, Line 354: DCHECK_GE(*slots_open, 0); : DCHECK_LE(*slots_open, opts_.scan_concurrency); > These are redundant now that we're actually using the values we already DCH They're technically not redundant but they are probably unnecessary. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc File src/kudu/tools/ksck_remote-test.cc: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@416 PS9, Line 416: TestChecksumSnapshotLastingLongerThanAHM > Did you try to run this with --stress-cpu-threads=8 and many iterations usi 1000 times, no failures, DEBUG http://dist-test.cloudera.org/job?job_id=wdberkeley.1538696623.61201 -- To view, visit http://gerrit.cloudera.org:8080/11554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 Gerrit-Change-Number: 11554 Gerrit-PatchSet: 9 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 04 Oct 2018 23:52:48 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11571 ) Change subject: KUDU-2563: [spark] Use the scanner keepAlive API .. Patch Set 2: (7 comments) > It would be good to double check that KuduScanner.keepAlive() is well behaved > when the scan is "in between" two tablets. Ideally that would do nothing, but > at the very least it shouldn't try to send a keep alive RPC into the ether > and then throw an exception when that fails. This is already verified in the keepAlive API unit tests and given that there is no exposure to that here, I don't think it's worth testing here as well. http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala: http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@125 PS1, Line 125: * @param scanner the wrapped scanner : * @param kuduContext the kudu context > Update this. Done http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@141 PS1, Line 141:* Calls the keepAlive API on the current scanner if the keepAlivePeriodMs has passed. > I wish I had remembered earlier, but this isn't safe; scanners aren't threa Ah, right. Makes sense. I will switch to the read loop model then. http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@161 PS1, Line 161: } > Don't we need to do this if we were interrupted too? So maybe wrap the whol We wouldn't have needed to, given the interrupt only exists for shutting down. But I am removing the thread model anyway. http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala: http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@19 PS2, Line 19: import java.util.concurrent.Executors > Can you double check that you still need all of these new imports? Done http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@146 PS2, Line 146: LOG.error(s"Calling keepAlive on ${scanner.currentTablet}") > For debugging only? yep. removed. http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@147 PS2, Line 147: scanner.keepAlive() > Is it OK if this fails and throws a KuduException? yes, I would expect an exception to fail the job. http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala: http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@53 PS1, Line 53: val defaultKeepAlivePeriodMs: Long = 15000 // 25% of the default scanner ttl. > Should doc the choice in value here, probably related to the default value Done -- To view, visit http://gerrit.cloudera.org:8080/11571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Gerrit-Change-Number: 11571 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Thu, 04 Oct 2018 23:41:16 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API
Hello Mike Percy, Dan Burkert, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11571 to look at the new patch set (#3). Change subject: KUDU-2563: [spark] Use the scanner keepAlive API .. KUDU-2563: [spark] Use the scanner keepAlive API Adds scheduled keepAlive calls to the scanner in the KuduRDD RowIterator. The period in which the calls are made is configurable via keepAlivePeriodMs and has a default of 15 seconds (which is 1/4 the default scanner ttl). This implementation is similar to the Impala integration. It checks if a call to the keepAlive API is needed as it processes each row. Compared to a background thread, this has the downside of being less consistently scheduled and susceptible to scenarios in which a single row takes longer to process than the ttl. However, because the scanner is not thread safe, this is the most straightforward solution and has been proven to work. Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala 6 files changed, 139 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/11571/3 -- To view, visit http://gerrit.cloudera.org:8080/11571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Gerrit-Change-Number: 11571 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] docs: add a note about RAID-0
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11580 ) Change subject: docs: add a note about RAID-0 .. docs: add a note about RAID-0 Adds a note that a single RAID-0 device with multiple disks is not preferable to specifying multiple fs directories across multiple disks. Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Reviewed-on: http://gerrit.cloudera.org:8080/11580 Reviewed-by: Adar Dembo Tested-by: Kudu Jenkins --- M docs/configuration.adoc 1 file changed, 5 insertions(+), 1 deletion(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] docs: add a note about RAID-0
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11580 ) Change subject: docs: add a note about RAID-0 .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Thu, 04 Oct 2018 22:50:21 + Gerrit-HasComments: No
[kudu-CR] docs: add a note about RAID-0
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11580 ) Change subject: docs: add a note about RAID-0 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/11580/2/docs/configuration.adoc File docs/configuration.adoc: http://gerrit.cloudera.org:8080/#/c/11580/2/docs/configuration.adoc@80 PS2, Line 80: If not specified, data blocks will : be placed in the directory specified by `--fs_wal_dir`. Nit: I'd put this ahead of the RAID note, since it deals with the semantics rather than the effects. -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Thu, 04 Oct 2018 22:33:34 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11571 ) Change subject: KUDU-2563: [spark] Use the scanner keepAlive API .. Patch Set 2: (3 comments) It would be good to double check that KuduScanner.keepAlive() is well behaved when the scan is "in between" two tablets. Ideally that would do nothing, but at the very least it shouldn't try to send a keep alive RPC into the ether and then throw an exception when that fails. http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala: http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@19 PS2, Line 19: import java.util.concurrent.Executors Can you double check that you still need all of these new imports? http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@146 PS2, Line 146: LOG.error(s"Calling keepAlive on ${scanner.currentTablet}") For debugging only? http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@147 PS2, Line 147: scanner.keepAlive() Is it OK if this fails and throws a KuduException? -- To view, visit http://gerrit.cloudera.org:8080/11571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Gerrit-Change-Number: 11571 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Thu, 04 Oct 2018 22:41:57 + Gerrit-HasComments: Yes
[kudu-CR] docs: add a note about RAID-0
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11580 ) Change subject: docs: add a note about RAID-0 .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/11580/2/docs/configuration.adoc File docs/configuration.adoc: http://gerrit.cloudera.org:8080/#/c/11580/2/docs/configuration.adoc@80 PS2, Line 80: es rather than delegating the : striping to a RAID-0 array. > Nit: I'd put this ahead of the RAID note, since it deals with the semantics Done -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Thu, 04 Oct 2018 22:37:50 + Gerrit-HasComments: Yes
[kudu-CR] docs: add a note about RAID-0
Hello Alex Rodoni, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11580 to look at the new patch set (#3). Change subject: docs: add a note about RAID-0 .. docs: add a note about RAID-0 Adds a note that a single RAID-0 device with multiple disks is not preferable to specifying multiple fs directories across multiple disks. Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 --- M docs/configuration.adoc 1 file changed, 5 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/11580/3 -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11554 ) Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets .. Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc File src/kudu/tools/ksck-test.cc: http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc@21 PS10, Line 21: #include Unused? Or is it? http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc@1588 PS10, Line 1588: new_uuid, : std::make_shared(new_uuid)); nit: add a space http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc File src/kudu/tools/ksck_checksum.cc: http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@298 PS7, Line 298: ol > If I do that I need to cast 'FLAGS_...' to match whatever is inferred for ' Ah, thanks for doing that. I don't think I'm a fan of that; it's verbose and probably not worth the cycles to read/ponder as a developer coming back to this code. I'm in favor of reverting it if you agree http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc File src/kudu/tools/ksck_checksum.cc: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@353 PS9, Line 353: nit: spacing http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@354 PS9, Line 354: slot_counts_to_decrement.push_back(slots_open); : } These are redundant now that we're actually using the values we already DCHECKed -- To view, visit http://gerrit.cloudera.org:8080/11554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 Gerrit-Change-Number: 11554 Gerrit-PatchSet: 10 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 04 Oct 2018 21:20:06 + Gerrit-HasComments: Yes
[kudu-CR] docs: update docs for CFile checksum handling
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11581 ) Change subject: docs: update docs for CFile checksum handling .. docs: update docs for CFile checksum handling Notes that the behavior when encountering a CFile checksum has changed in 1.8.0. I've kept around the manual steps, since they are still valuable. Change-Id: I11ecfe2739122f80894c5bbba13de853d962754a Reviewed-on: http://gerrit.cloudera.org:8080/11581 Reviewed-by: Grant Henke Tested-by: Grant Henke Tested-by: Kudu Jenkins --- M docs/troubleshooting.adoc 1 file changed, 7 insertions(+), 5 deletions(-) Approvals: Grant Henke: Looks good to me, approved; Verified Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/11581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I11ecfe2739122f80894c5bbba13de853d962754a Gerrit-Change-Number: 11581 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] docs: add a note about RAID-0
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11580 ) Change subject: docs: add a note about RAID-0 .. Patch Set 2: Verified+1 Overriding -1 since this is a docs patch -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Thu, 04 Oct 2018 20:44:18 + Gerrit-HasComments: No
[kudu-CR] [master] extra tests for the placement policy
Hello Will Berkeley, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11562 to look at the new patch set (#3). Change subject: [master] extra tests for the placement policy .. [master] extra tests for the placement policy Added a few more tests for the replica placement logic. This is to verify how an extra replica is placed in case of a tablet with RF=5. RF=5 test scenarios are interesting in exercising the replica placement logic vs RF=3 scenarios because placing two replicas into one location does not constitute a violation of placement policy in case of RF=5. So, those RF=5 scenarios allows to verify how the placement logic works during initial replica placement and while adding an extra replica when it's possible to place more than one replica into same location. This is a follow-up to ebb2852d99ed27c26e65c3569d5cb515754b2937. Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04 --- M src/kudu/master/placement_policy-test.cc M src/kudu/master/placement_policy.h 2 files changed, 200 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/11562/3 -- To view, visit http://gerrit.cloudera.org:8080/11562 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04 Gerrit-Change-Number: 11562 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley
[kudu-CR] docs: add a note about RAID-0
Andrew Wong has removed a vote on this change. Change subject: docs: add a note about RAID-0 .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [master] extra tests for the placement policy
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11562 ) Change subject: [master] extra tests for the placement policy .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/11562/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11562/1//COMMIT_MSG@11 PS1, Line 11: The test scenario for RF=5 is chosen to exercise the logic where two : replicas per a location does not constitute yet a placement policy : violation. > I'm not sure what this sentence says. Done http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc File src/kudu/master/placement_policy-test.cc: http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@598 PS1, Line 598: 3 > I think it's worth naming this as 'num_replicas', here and below. Done http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@601 PS1, Line 601: ASSERT_GE(1, m.count("A_ts0") + m.count("A_ts1")); : ASSERT_GE(1, m.count("B_ts0") + m.count("B_ts1")); : ASSERT_GE(1, m.count("C_ts0")); : ASSERT_GE(1, m.count("D_ts0")); : ASSERT_GE(1, m.count("E_ts0")); : ASSERT_EQ(3, m.count("A_ts0") + m.count("A_ts1") + : m.count("B_ts0") + m.count("B_ts1") + : m.count("C_ts0") + m.count("D_ts0") + m.count("E_ts0")); > Could you write a quick note explaining what the intent of the checks is, i Done http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@625 PS1, Line 625: talbet > tablet Done http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@697 PS1, Line 697: ASSERT_TRUE( : extra_ts->permanent_uuid() == "A_ts2" || : extra_ts->permanent_uuid() == "C_ts0") > Why is this expected? Among the locations where an additional replica can be placed, location A and location C have the least load. As for the preferences within location A, the replica is placed using power-of-two choices among the available servers. I added corresponding comment. http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@704 PS1, Line 704: // This is similar to PlaceExtraTablet_L5_TS10_RF5 but 16 tablet servers instead : // of 10 and slightly different replica distribution. > What specifically is this testing then? Actually, this assumed to be a test that verifies how evenly the replicas are placed among the servers in locations A and B. I updated the test. http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@726 PS1, Line 726: ASSERT_TRUE(extra_ts->permanent_uuid() == "A_ts1" || : extra_ts->permanent_uuid() == "A_ts2" || : extra_ts->permanent_uuid() == "A_ts3" || : extra_ts->permanent_uuid() == "B_ts1" || : extra_ts->permanent_uuid() == "B_ts2" || : extra_ts->permanent_uuid() == "B_ts3") > Why is this pick expected? Updated the overall description. -- To view, visit http://gerrit.cloudera.org:8080/11562 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04 Gerrit-Change-Number: 11562 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 04 Oct 2018 20:35:03 + Gerrit-HasComments: Yes
[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11554 ) Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets .. Patch Set 7: (18 comments) http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG@15 PS9, Line 15: before > nit: before --> behind ? 'before' is ok because we're talking about (something sort of like) time. http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG@60 PS9, Line 60: y large c > BTW, what happens if _some_ of the peers are unavailable? Do we need to ad It's not special. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck.h@353 PS9, Line 353: timestamp_ = timestamp; : } : > Is it used at all? Done http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.h File src/kudu/tools/ksck_checksum.h: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.h@256 PS9, Line 256: // be checksummed based in available slots on tablet servers. : gscoped_ptr find_tablets_t > nit: why not to move into std::atomic with those as well? Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc File src/kudu/tools/ksck_checksum.cc: http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@358 PS7, Line 358: shared_ptr messenger; : rpc::MessengerBuilder builder("timestamp update"); : RETURN_NOT_OK(builder.Build()); > I never really saw the point of the class hierarchy here; we could have pok Done http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc File src/kudu/tools/ksck_checksum.cc: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@66 PS9, Line 66: DEFINE_int32(max_progress_report_wait_ms, 5000, : "Maximum number of milliseconds to wait between progress reports. " : "Used to speed up tests."); : TAG_FLAG(max_progress_report_wait_ms, hidden); > Could this be also useful while using the tool in the field? I don't want to expose it because it's meant for tests. The updates are very short, there's really no reason to get them more or less often, and it just pollutes the option list with one nobody is really gonna use. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@70 PS9, Line 70: timestamp_update_period_ms > Does it make sense to keep it in milliseconds? Maybe, a second-grain resol I guess, but I don't even think it's worth changing. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@86 PS9, Line 86: when the all the checksums for its replicas b > nit: maybe, change to 'when the checksumming for all its replicas begins' I don't think there's a meaningful difference there. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@101 PS9, Line 101: listed in 'tservers' > What about the reverse: is it OK to have a tablet server in 'tservers' that Yes. It could be one with no replicas on it. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@106 PS9, Line 106: CHECK(num_replicas); > nit: maybe, DCHECK is enough, since it would crash anyway further down if i I think the crash is cleaner if it's a CHECK failure. This isn't a hot path, anyway. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@110 PS9, Line 110: tserver_uuid_set > Consider using helper functions from map-util.h: those present more express Done http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@187 PS9, Line 187: bool KsckChecksumManager::HasOpenTsSlotsUnlocked() { > nit: add const specifier? Done http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@383 PS9, Line 383: r (const auto& ts : > nit: use helper functions from map-util.h Done http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc File src/kudu/tools/ksck_remote-test.cc: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@429 PS9, Line 429: or > and Done http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@428 PS9, Line 428: #if defined(THREAD_SANITIZER) || defined(ADDRESS_SANITIZER) : LOG(WARNING) << "test is skipped in TSAN or ASAN builds"; : return; : #endif > nit: shift this left one indent Why? That looks really funny. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@476 PS9, Line 476: MonoDelta::FromSeconds(timeout_sec), : MonoDelta::FromSeconds(timeout_sec), : scan_concurrency,
[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API
Hello Mike Percy, Dan Burkert, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11571 to look at the new patch set (#2). Change subject: KUDU-2563: [spark] Use the scanner keepAlive API .. KUDU-2563: [spark] Use the scanner keepAlive API Adds scheduled keepAlive calls to the scanner in the KuduRDD RowIterator. The period in which the calls are made is configurable via keepAlivePeriodMs and has a default of 15 seconds (which is 1/4 the default scanner ttl). This implementation is similar to the Impala integration. It checks if a call to the keepAlive API is needed as it processes each row. Compared to a background thread, this has the downside of being less consistently scheduled and susceptible to scenarios in which a single row takes longer to process than the ttl. However, because the scanner is not thread safe, this is the most straightforward solution and has been proven to work. Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala 6 files changed, 152 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/11571/2 -- To view, visit http://gerrit.cloudera.org:8080/11571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db Gerrit-Change-Number: 11571 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] docs: add a note about RAID-0
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11580 ) Change subject: docs: add a note about RAID-0 .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc File docs/configuration.adoc: http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@76 PS1, Line 76: Note that while RAID-0 : devices with multiple drives may perform better than single drives, using a : single RAID-0 device instead of specifying multiple data directories may reduce : the amount of parallelism with which Kudu can operate. Maybe: "Note that while a single data directory backed by a RAID-0 device will outperform a single data directory backed by a single device, it is better to let Kudu manage its own striping over multiple storage devices rather than delegating the striping to a RAID-0 array." -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Thu, 04 Oct 2018 19:56:10 + Gerrit-HasComments: Yes
[kudu-CR] docs: update docs for CFile checksum handling
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11581 ) Change subject: docs: update docs for CFile checksum handling .. Patch Set 1: Verified+1 Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11ecfe2739122f80894c5bbba13de853d962754a Gerrit-Change-Number: 11581 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Thu, 04 Oct 2018 20:08:23 + Gerrit-HasComments: No
[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets
Hello Tidy Bot, Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Adar Dembo, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11554 to look at the new patch set (#10). Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets .. [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets ksck checksum scans allow the user to checksum with snapshot scans, so that a checksum can be done even as tablets are mutated. It also allows users to omit a snapshot timestamp. Previously, in this case, the snapshot timestamp would be retrieved from some healthy tablet server at the beginning of the checksum process, and used for every replica. This didn't work well for checksumming large tables, because eventually the snapshot timestamp fell before the ancient history mark, and subsequent checksums scans would not be accepted by the tablet servers. This changes how checksum scans work to address this problem: 1. A background process periodically updates timestamps from tablet servers. 2. The checksum process is reorganized so the replicas of one tablet are checksummed together. 3. When a tablet is about to be checksummed, and the checksum scan is a snapshot scan with no user-provided timestamp, the tablet is assigned an up-to-date timestamp from one of the tablet servers that hosts a replica. Every replica is then checksummed using this snapshot timestamp. 4. The original default timeout of 3600 seconds for a checksum scan is too low, but it didn't really matter because the default tablet history max age was 900 seconds. Now that checksum scans can continue for many hours, the default timeout is raised to 86400 seconds (1 day), and a new idle timeout is added. If a checksum process does not checksum an additional row for this idle timeout (default 10 minutes), it will idle time out. Note that there is a new scheduling problem given #2: each tablet server has a fixed number of slots for checksum scans, but every tablet server hosting a replica must have a slot available before any replica's checksum can start, so deciding in which order to checksum tablets and how to find which are available to schedule is important. Given that the bulk of the time in checksums is occupied waiting for tablet servers to read lots of data off disk, materialize it as rows, and checksum it, it's worth spending a lot of effort to make sure the cluster is fully utilized given the scan concurrency constraints. So, the tool uses a brute force approach and simply checks all tablets to see which can be checksummed, any time a replica checksum finishes and frees a slot. Tablets are considered in tablet id order. Since tablet ids are UUIDs, there should be no correlation between a tablet's id and how its replicas are distributed across tablet servers. There are several tests added: 1. For the KUDU-2179 fix itself. 2. For the idle timeout. 3. For when a checksum finds mismatches. Yes, we didn't have a test for this before. After adding this test I saw that the output is a little confusing since it reported the number of replicas with mismatches rather than the number of tablets, so I altered the output to fix that. 4. A couple of tests exercising situations when all tablet servers are unavailable and when all peers of a tablet are unavailable. I also checksummed a very large cluster with 500TB of data or so, across about 37000 replicas. The checksum scan completed successfully after more than 12 hours. Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_checksum.cc M src/kudu/tools/ksck_checksum.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/ksck_remote.h 7 files changed, 871 insertions(+), 275 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/11554/10 -- To view, visit http://gerrit.cloudera.org:8080/11554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 Gerrit-Change-Number: 11554 Gerrit-PatchSet: 10 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] docs: update docs for CFile checksum handling
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11581 Change subject: docs: update docs for CFile checksum handling .. docs: update docs for CFile checksum handling Notes that the behavior when encountering a CFile checksum has changed in 1.8.0. I've kept around the manual steps, since they are still valuable. Change-Id: I11ecfe2739122f80894c5bbba13de853d962754a --- M docs/troubleshooting.adoc 1 file changed, 7 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/11581/1 -- To view, visit http://gerrit.cloudera.org:8080/11581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I11ecfe2739122f80894c5bbba13de853d962754a Gerrit-Change-Number: 11581 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong
[kudu-CR] docs: add a note about RAID-0
Hello Alex Rodoni, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11580 to look at the new patch set (#2). Change subject: docs: add a note about RAID-0 .. docs: add a note about RAID-0 Adds a note that a single RAID-0 device with multiple disks is not preferable to specifying multiple fs directories across multiple disks. Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 --- M docs/configuration.adoc 1 file changed, 6 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/11580/2 -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] docs: add a note about RAID-0
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11580 ) Change subject: docs: add a note about RAID-0 .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc File docs/configuration.adoc: http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@77 PS1, Line 77: R > the ones with Went with a different sentence entirely. http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@76 PS1, Line 76: : Note that while a single data directory backed by a RAID-0 array will : outperform a single data directory backed by a single storage device, it is : better to let Kudu manage its own striping over multip > Maybe: "Note that while a single data directory backed by a RAID-0 device w Taken near-verbatim http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@76 PS1, Line 76: : Note that while a single data directory backed by a RAID-0 array will : outperform a single data directory backed by a single storage device, it is : better to let Kudu manage its own striping over multipl > Maybe put this in a separate Note paragraph below? I considered that, but I think here is more appropriate since we discuss striping already. -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Thu, 04 Oct 2018 20:01:14 + Gerrit-HasComments: Yes
[kudu-CR] docs: add a note about RAID-0
Alex Rodoni has posted comments on this change. ( http://gerrit.cloudera.org:8080/11580 ) Change subject: docs: add a note about RAID-0 .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc File docs/configuration.adoc: http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@77 PS1, Line 77: the ones with http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@76 PS1, Line 76: Note that while RAID-0 : devices with multiple drives may perform better than single drives, using a : single RAID-0 device instead of specifying multiple data directories may reduce : the amount of parallelism with which Kudu can operate. Maybe put this in a separate Note paragraph below? -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alex Rodoni Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Thu, 04 Oct 2018 19:59:07 + Gerrit-HasComments: Yes
[kudu-CR] docs: add a note about RAID-0
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11580 Change subject: docs: add a note about RAID-0 .. docs: add a note about RAID-0 Adds a note that a single RAID-0 device with multiple disks is not preferable to specifying multiple fs directories across multiple disks. Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 --- M docs/configuration.adoc 1 file changed, 4 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/11580/1 -- To view, visit http://gerrit.cloudera.org:8080/11580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71 Gerrit-Change-Number: 11580 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong
[kudu-CR] [master] extra tests for the placement policy
Hello Will Berkeley, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11562 to look at the new patch set (#2). Change subject: [master] extra tests for the placement policy .. [master] extra tests for the placement policy Added a few more tests for the replica placement logic. This is to verify how an extra replica is placed in case of a tablet with RF=5. The test scenario for RF=5 is chosen to exercise the logic where two replicas per a location does not constitute yet a placement policy violation. This is a follow-up to ebb2852d99ed27c26e65c3569d5cb515754b2937. Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04 --- M src/kudu/master/placement_policy-test.cc M src/kudu/master/placement_policy.h 2 files changed, 200 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/11562/2 -- To view, visit http://gerrit.cloudera.org:8080/11562 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04 Gerrit-Change-Number: 11562 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Will Berkeley
[kudu-CR] Remove Pandas as a Python Client Requirement
Jordan Birdsell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11557 ) Change subject: Remove Pandas as a Python Client Requirement .. Patch Set 5: Code-Review+1 I'm good with this change but would like someone else to provide the 2nd +1 since there has been a lot of feedback here. -- To view, visit http://gerrit.cloudera.org:8080/11557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8 Gerrit-Change-Number: 11557 Gerrit-PatchSet: 5 Gerrit-Owner: a...@phdata.io Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jordan Birdsell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: a...@phdata.io Gerrit-Comment-Date: Thu, 04 Oct 2018 18:02:01 + Gerrit-HasComments: No
[kudu-CR] Remove Pandas as a Python Client Requirement
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11557 ) Change subject: Remove Pandas as a Python Client Requirement .. Patch Set 5: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh@506 PS3, Line 506: # We need to install Pandas manually since its not a required package but is needed > Done Missed its -> it's. But not a big deal. -- To view, visit http://gerrit.cloudera.org:8080/11557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8 Gerrit-Change-Number: 11557 Gerrit-PatchSet: 5 Gerrit-Owner: a...@phdata.io Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jordan Birdsell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: a...@phdata.io Gerrit-Comment-Date: Thu, 04 Oct 2018 18:17:23 + Gerrit-HasComments: Yes
[kudu-CR] Remove Pandas as a Python Client Requirement
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11557 ) Change subject: Remove Pandas as a Python Client Requirement .. Remove Pandas as a Python Client Requirement This is to remove the pandas dependency for users. We will put the pandas dependency in the jenkins script build-and-test.sh as a workaround. This is based on the initial work done here https://gerrit.cloudera.org/#/c/10809/4 Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8 Reviewed-on: http://gerrit.cloudera.org:8080/11557 Tested-by: Kudu Jenkins Reviewed-by: Jordan Birdsell Reviewed-by: Adar Dembo --- M build-support/jenkins/build-and-test.sh M python/kudu/tests/test_scanner.py M python/requirements.txt M python/setup.py 4 files changed, 23 insertions(+), 12 deletions(-) Approvals: Kudu Jenkins: Verified Jordan Birdsell: Looks good to me, but someone else must approve Adar Dembo: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/11557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8 Gerrit-Change-Number: 11557 Gerrit-PatchSet: 6 Gerrit-Owner: a...@phdata.io Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jordan Birdsell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: a...@phdata.io
[kudu-CR] KUDU-1592 Make warnings about FBM more ominous
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11579 ) Change subject: KUDU-1592 Make warnings about FBM more ominous .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/11579/1/docs/troubleshooting.adoc File docs/troubleshooting.adoc: http://gerrit.cloudera.org:8080/#/c/11579/1/docs/troubleshooting.adoc@85 PS1, Line 85: This results in a very large number of file descriptors that : the tablet server needs to open even with small amounts of data, which is a : limited resource and the tablet servers can easily exhaust it with file block manager. This isn't true; the FileCache will recycle file descriptors on an LRU basis in order to keep the total open fd count below a certain threshold. That doesn't detract from the larger argument (that it's not as performant) though. http://gerrit.cloudera.org:8080/#/c/11579/1/docs/troubleshooting.adoc@89 PS1, Line 89: One more reason why file block manager should be avoided is that it's : *impossible to switch between block managers* without wiping and reinitializing : the tablet servers. Combine this into the previous paragraph. http://gerrit.cloudera.org:8080/#/c/11579/1/src/kudu/fs/fs_manager.cc File src/kudu/fs/fs_manager.cc: http://gerrit.cloudera.org:8080/#/c/11579/1/src/kudu/fs/fs_manager.cc@70 PS1, Line 70: File Nit: "The file block manager..." -- To view, visit http://gerrit.cloudera.org:8080/11579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15adf42dd7e6376a85cdb178ad460f239a4f87a1 Gerrit-Change-Number: 11579 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 04 Oct 2018 18:21:32 + Gerrit-HasComments: Yes
[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11554 ) Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc File src/kudu/tools/ksck_checksum.cc: http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@358 PS7, Line 358: return true; : } : > There is no Messenger in a KsckCluster, the base class this class works wit I never really saw the point of the class hierarchy here; we could have poked enough holes in ksck to simplify testing without inheritance. Anyway, I think your suggestion is reasonable. How shady it will end up sort of depends on whether rpc::PeriodicTimer needs to be used by whatever tests use a MockKsckCluster. If it isn't used, you could get away with returning nullptr in MockKsckCluster::messenger(), which would ease the stinging. -- To view, visit http://gerrit.cloudera.org:8080/11554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 Gerrit-Change-Number: 11554 Gerrit-PatchSet: 9 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 04 Oct 2018 18:27:34 + Gerrit-HasComments: Yes
[kudu-CR] [master] extra tests for the placement policy
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11562 ) Change subject: [master] extra tests for the placement policy .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/11562/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11562/1//COMMIT_MSG@11 PS1, Line 11: The test scenario for RF=5 is chosen to exercise the logic where two : replicas per a location does not constitute yet a placement policy : violation. I'm not sure what this sentence says. http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc File src/kudu/master/placement_policy-test.cc: http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@598 PS1, Line 598: 3 I think it's worth naming this as 'num_replicas', here and below. http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@601 PS1, Line 601: ASSERT_GE(1, m.count("A_ts0") + m.count("A_ts1")); : ASSERT_GE(1, m.count("B_ts0") + m.count("B_ts1")); : ASSERT_GE(1, m.count("C_ts0")); : ASSERT_GE(1, m.count("D_ts0")); : ASSERT_GE(1, m.count("E_ts0")); : ASSERT_EQ(3, m.count("A_ts0") + m.count("A_ts1") + : m.count("B_ts0") + m.count("B_ts1") + : m.count("C_ts0") + m.count("D_ts0") + m.count("E_ts0")); Could you write a quick note explaining what the intent of the checks is, i.e. "Make sure that X". http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@625 PS1, Line 625: talbet tablet http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@697 PS1, Line 697: ASSERT_TRUE( : extra_ts->permanent_uuid() == "A_ts2" || : extra_ts->permanent_uuid() == "C_ts0") Why is this expected? http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@704 PS1, Line 704: // This is similar to PlaceExtraTablet_L5_TS10_RF5 but 16 tablet servers instead : // of 10 and slightly different replica distribution. What specifically is this testing then? http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@726 PS1, Line 726: ASSERT_TRUE(extra_ts->permanent_uuid() == "A_ts1" || : extra_ts->permanent_uuid() == "A_ts2" || : extra_ts->permanent_uuid() == "A_ts3" || : extra_ts->permanent_uuid() == "B_ts1" || : extra_ts->permanent_uuid() == "B_ts2" || : extra_ts->permanent_uuid() == "B_ts3") Why is this pick expected? -- To view, visit http://gerrit.cloudera.org:8080/11562 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04 Gerrit-Change-Number: 11562 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 04 Oct 2018 17:43:40 + Gerrit-HasComments: Yes
[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11554 ) Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets .. Patch Set 9: (18 comments) Skimmed trough for a first look, will take another look later today. Overall looks pretty good, as of now mostly nits. http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG@15 PS9, Line 15: before nit: before --> behind ? http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG@60 PS9, Line 60: all peers BTW, what happens if _some_ of the peers are unavailable? Do we need to add a test for that or there is nothing special about that? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck.h@353 PS9, Line 353: void set_timestamp(uint64_t timestamp) { : timestamp_ = timestamp; : } Is it used at all? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.h File src/kudu/tools/ksck_checksum.h: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.h@256 PS9, Line 256: AtomicInt rows_summed_; : AtomicInt disk_bytes_summed_; nit: why not to move into std::atomic with those as well? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc File src/kudu/tools/ksck_checksum.cc: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@66 PS9, Line 66: DEFINE_int32(max_progress_report_wait_ms, 5000, : "Maximum number of milliseconds to wait between progress reports. " : "Used to speed up tests."); : TAG_FLAG(max_progress_report_wait_ms, hidden); Could this be also useful while using the tool in the field? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@70 PS9, Line 70: timestamp_update_period_ms Does it make sense to keep it in milliseconds? Maybe, a second-grain resolution would be enough? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@86 PS9, Line 86: when all the checksums for its replicas begin nit: maybe, change to 'when the checksumming for all its replicas begins' http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@101 PS9, Line 101: listed in 'tservers' What about the reverse: is it OK to have a tablet server in 'tservers' that is not referenced by any tablet from the 'tablet_infos'? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@106 PS9, Line 106: CHECK(num_replicas); nit: maybe, DCHECK is enough, since it would crash anyway further down if it were null. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@110 PS9, Line 110: tserver_uuid_set Consider using helper functions from map-util.h: those present more expressive semantics. E.g., here it's supposed to be InsertOrDie(_uuid_set, tserver->uuid()), right? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@187 PS9, Line 187: bool KsckChecksumManager::HasOpenTsSlotsUnlocked() { nit: add const specifier? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@383 PS9, Line 383: replica_uuids.insert nit: use helper functions from map-util.h http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc File src/kudu/tools/ksck_remote-test.cc: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@416 PS9, Line 416: TestChecksumSnapshotLastingLongerThanAHM Did you try to run this with --stress-cpu-threads=8 and many iterations using dist-test? http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@429 PS9, Line 429: or and http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@428 PS9, Line 428: #if defined(THREAD_SANITIZER) || defined(ADDRESS_SANITIZER) : LOG(WARNING) << "test is skipped in TSAN or ASAN builds"; : return; : #endif nit: shift this left one indent http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@476 PS9, Line 476: MonoDelta::FromSeconds(timeout_sec), : MonoDelta::FromSeconds(timeout_sec), : scan_concurrency, : use_snapshot, : KsckChecksumOptions::kCurrentTimestamp))); nit: indent should be 4 spaces from the outer scope's indent. http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote.h File src/kudu/tools/ksck_remote.h: http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote.h@130 PS9, Line 130: style nit: indent should be 4 spaces from the outer scope
[kudu-CR] Remove Pandas as a Python Client Requirement
a...@phdata.io has posted comments on this change. ( http://gerrit.cloudera.org:8080/11557 ) Change subject: Remove Pandas as a Python Client Requirement .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/11557/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11557/3//COMMIT_MSG@13 PS3, Line 13: > initial Done -- To view, visit http://gerrit.cloudera.org:8080/11557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8 Gerrit-Change-Number: 11557 Gerrit-PatchSet: 5 Gerrit-Owner: a...@phdata.io Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jordan Birdsell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: a...@phdata.io Gerrit-Comment-Date: Thu, 04 Oct 2018 15:18:40 + Gerrit-HasComments: Yes
[kudu-CR] Remove Pandas as a Python Client Requirement
Hello Kudu Jenkins, Jordan Birdsell, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11557 to look at the new patch set (#5). Change subject: Remove Pandas as a Python Client Requirement .. Remove Pandas as a Python Client Requirement This is to remove the pandas dependency for users. We will put the pandas dependency in the jenkins script build-and-test.sh as a workaround. This is based on the initial work done here https://gerrit.cloudera.org/#/c/10809/4 Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8 --- M build-support/jenkins/build-and-test.sh M python/kudu/tests/test_scanner.py M python/requirements.txt M python/setup.py 4 files changed, 23 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/11557/5 -- To view, visit http://gerrit.cloudera.org:8080/11557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8 Gerrit-Change-Number: 11557 Gerrit-PatchSet: 5 Gerrit-Owner: a...@phdata.io Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jordan Birdsell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: a...@phdata.io
[kudu-CR] Remove Pandas as a Python Client Requirement
a...@phdata.io has posted comments on this change. ( http://gerrit.cloudera.org:8080/11557 ) Change subject: Remove Pandas as a Python Client Requirement .. Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/11557/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11557/3//COMMIT_MSG@12 PS3, Line 12: work done here https://gerrit.cloudera.org/#/c/10809/4 > dependency Done http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh@506 PS3, Line 506: # We need to install Pandas manually since its not a required package but is needed > Nit: it's Done http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh@508 PS3, Line 508: # pandas 0.18 dropped support for python 2.6. > Can you pin this to 0.18, which was the last version to support Python 2.6? Done http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh@567 PS3, Line 567: # > Nit: leading whitespace here. Done http://gerrit.cloudera.org:8080/#/c/11557/3/python/kudu/tests/test_scanner.py File python/kudu/tests/test_scanner.py: http://gerrit.cloudera.org:8080/#/c/11557/3/python/kudu/tests/test_scanner.py@385 PS3, Line 385: @pytest.mark.skipif((not(kudu.CLIENT_SUPPORTS_PANDAS) and : (not(kudu.CLIENT_SUPPORTS_DECIMAL))), > Can this be formatted like this? Done http://gerrit.cloudera.org:8080/#/c/11557/3/python/setup.py File python/setup.py: http://gerrit.cloudera.org:8080/#/c/11557/3/python/setup.py@197 PS3, Line 197: # pytest 3.3 [1] and pytest-timeout 1.2.1 [2] dropped : # support for python 2.6. : # : # 1. https://docs.pytest.org/en/latest/changelog.html#id164 : # 2. https://pypi.org/project/pytest-timeout/#id5 : tests_require=['pytest >=2.8,<3.3', > Update this to remove the pandas reference. Done -- To view, visit http://gerrit.cloudera.org:8080/11557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8 Gerrit-Change-Number: 11557 Gerrit-PatchSet: 4 Gerrit-Owner: a...@phdata.io Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jordan Birdsell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: a...@phdata.io Gerrit-Comment-Date: Thu, 04 Oct 2018 15:17:22 + Gerrit-HasComments: Yes
[kudu-CR] Remove Pandas as a Python Client Requirement
Hello Kudu Jenkins, Jordan Birdsell, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11557 to look at the new patch set (#4). Change subject: Remove Pandas as a Python Client Requirement .. Remove Pandas as a Python Client Requirement This is to remove the pandas dependency for users. We will put the pandas dependency in the jenkins script build-and-test.sh as a workaround. This is based on the intial work done here https://gerrit.cloudera.org/#/c/10809/4 Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8 --- M build-support/jenkins/build-and-test.sh M python/kudu/tests/test_scanner.py M python/requirements.txt M python/setup.py 4 files changed, 23 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/11557/4 -- To view, visit http://gerrit.cloudera.org:8080/11557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8 Gerrit-Change-Number: 11557 Gerrit-PatchSet: 4 Gerrit-Owner: a...@phdata.io Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Jordan Birdsell Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets
Hello Tidy Bot, Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Adar Dembo, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11554 to look at the new patch set (#9). Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets .. [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets ksck checksum scans allow the user to checksum with snapshot scans, so that a checksum can be done even as tablets are mutated. It also allows users to omit a snapshot timestamp. Previously, in this case, the snapshot timestamp would be retrieved from some healthy tablet server at the beginning of the checksum process, and used for every replica. This didn't work well for checksumming large tables, because eventually the snapshot timestamp fell before the ancient history mark, and subsequent checksums scans would not be accepted by the tablet servers. This changes how checksum scans work to address this problem: 1. A background process periodically updates timestamps from tablet servers. 2. The checksum process is reorganized so the replicas of one tablet are checksummed together. 3. When a tablet is about to be checksummed, and the checksum scan is a snapshot scan with no user-provided timestamp, the tablet is assigned an up-to-date timestamp from one of the tablet servers that hosts a replica. Every replica is then checksummed using this snapshot timestamp. 4. The original default timeout of 3600 seconds for a checksum scan is too low, but it didn't really matter because the default tablet history max age was 900 seconds. Now that checksum scans can continue for many hours, the default timeout is raised to 86400 seconds (1 day), and a new idle timeout is added. If a checksum process does not checksum an additional row for this idle timeout (default 10 minutes), it will idle time out. Note that there is a new scheduling problem given #2: each tablet server has a fixed number of slots for checksum scans, but every tablet server hosting a replica must have a slot available before any replica's checksum can start, so deciding in which order to checksum tablets and how to find which are available to schedule is important. Given that the bulk of the time in checksums is occupied waiting for tablet servers to read lots of data off disk, materialize it as rows, and checksum it, it's worth spending a lot of effort to make sure the cluster is fully utilized given the scan concurrency constraints. So, the tool uses a brute force approach and simply checks all tablets to see which can be checksummed, any time a replica checksum finishes and frees a slot. Tablets are considered in tablet id order. Since tablet ids are UUIDs, there should be no correlation between a tablet's id and how its replicas are distributed across tablet servers. There are several tests added: 1. For the KUDU-2179 fix itself. 2. For the idle timeout. 3. For when a checksum finds mismatches. Yes, we didn't have a test for this before. After adding this test I saw that the output is a little confusing since it reported the number of replicas with mismatches rather than the number of tablets, so I altered the output to fix that. 4. A couple of tests exercising situations when all tablet servers are unavailable and when all peers of a tablet are unavailable. I also checksummed a very large cluster with 500TB of data or so, across about 37000 replicas. The checksum scan completed successfully after more than 12 hours. Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_checksum.cc M src/kudu/tools/ksck_checksum.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/ksck_remote.h 7 files changed, 846 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/11554/9 -- To view, visit http://gerrit.cloudera.org:8080/11554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 Gerrit-Change-Number: 11554 Gerrit-PatchSet: 9 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1592 Make warnings about FBM more ominous
Hello Jean-Daniel Cryans, Adar Dembo, Grant Henke, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/11579 to review the following change. Change subject: KUDU-1592 Make warnings about FBM more ominous .. KUDU-1592 Make warnings about FBM more ominous The documentation warned about scalability issues with file block manager, but it was rather lightly worded and so was the warning message of a hole punch test failure. This commit removes the actual flag necessary to enable FBM from the hole punching tests' warning to force users look it up in the documentation instead of blindly applying it, adds a note about it being unsuitable for production use to the flag's description and expands the warning in the troubleshooting section. Change-Id: I15adf42dd7e6376a85cdb178ad460f239a4f87a1 --- M docs/troubleshooting.adoc M src/kudu/fs/data_dirs.cc M src/kudu/fs/fs_manager.cc 3 files changed, 32 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/11579/1 -- To view, visit http://gerrit.cloudera.org:8080/11579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I15adf42dd7e6376a85cdb178ad460f239a4f87a1 Gerrit-Change-Number: 11579 Gerrit-PatchSet: 1 Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans
[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11554 ) Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets .. Patch Set 7: (18 comments) http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h File src/kudu/tools/ksck_checksum.h: http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@45 PS7, Line 45: } > nit: add "// namespace rpc" Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@136 PS7, Line 136: static Status New(KsckChecksumOptions opts, > warning: function 'kudu::tools::KsckChecksumManager::New' has a definition Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@140 PS7, Line 140: ~KsckChecksumManager() = default; > Is this needed? Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@169 PS7, Line 169: const MonoDelta& timeout, : const MonoDelta& idle_timeout, > Why not use the `timeout` and `idle_timeout` in `opts_`? Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@211 PS7, Line 211: // Initialize 'ts_open_slots_map_'. : void InitializeTsSlotsMap(); > Does it matter whether this is called more than once? If not, maybe name it It shouldn't be called more than once since it sets the number of slots to their initial values and other methods manage them from there. http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@256 PS7, Line 256: in > nit: on Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc File src/kudu/tools/ksck_checksum.cc: http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@86 PS7, Line 86: when the all > nit: extra word Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@118 PS7, Line 118: > nit: spacing Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@256 PS7, Line 256: int64_t delta_rows, int64_t delta_bytes > Let's DCHECK that these are positive. Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@270 PS7, Line 270: auto& tablet_result = LookupOrInsert(_, > We've got emplace-based functions in map-util.h; you should prefer them to Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@279 PS7, Line 279: boost::bind(::StartTabletChecksums, this)), > Prefer std::bind; I think std::function can implicitly cast to boost::funct Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@293 PS7, Line 293: MonoTime::Now() > nit: reuse `start`? Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@298 PS7, Line 298: int > No more `auto`? If I do that I need to cast 'FLAGS_...' to match whatever is inferred for 'rem_ms', or cast 'rem_ms' to be an int32_t, removing the point of auto. I'll change it to the former alternative and you can see. http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@304 PS7, Line 304: > nit: spacing Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@341 PS7, Line 341: auto& slots_open = FindOrDie(ts_slots_open_map_, replica->ts_uuid()); > nit: perhaps put these into a vector so you don't have to re-FindOrDie them Done http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@358 PS7, Line 358: shared_ptr messenger; : rpc::MessengerBuilder builder("timestamp update"); : RETURN_NOT_OK(builder.Build()); > Can't you reuse the Messenger that ksck uses for RPCs? There is no Messenger in a KsckCluster, the base class this class works with. There's only a Messenger in a RemoteKsckCluster, and it's held privately. I guess I can add a virtual KsckCluster::messenger() getter that returns a shared_ptr to a KsckCluster's Messenger, which just constructs a new Messenger when called on a MockKsckCluster? That seems a little shady. Does it sound worth it to you? http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@402 PS7, Line 402: VLOG(1) << LogPrefix(tablet_info.tablet->id()) << "Using timestamp " : << timestamp_for_tablet; > What should happen if `timestamp_for_tablet` is kCurrentTimestamp? I know w In that case we probably ought to short-circuit the checksum with an error. http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@620 PS7, Line 620: tablet_infos, : tablet_servers, : )); > nit: spacing Done -- To view, visit http://gerrit.cloudera.org:8080/11554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master
[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets
Hello Tidy Bot, Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Adar Dembo, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11554 to look at the new patch set (#8). Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets .. [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets ksck checksum scans allow the user to checksum with snapshot scans, so that a checksum can be done even as tablets are mutated. It also allows users to omit a snapshot timestamp. Previously, in this case, the snapshot timestamp would be retrieved from some healthy tablet server at the beginning of the checksum process, and used for every replica. This didn't work well for checksumming large tables, because eventually the snapshot timestamp fell before the ancient history mark, and subsequent checksums scans would not be accepted by the tablet servers. This changes how checksum scans work to address this problem: 1. A background process periodically updates timestamps from tablet servers. 2. The checksum process is reorganized so the replicas of one tablet are checksummed together. 3. When a tablet is about to be checksummed, and the checksum scan is a snapshot scan with no user-provided timestamp, the tablet is assigned an up-to-date timestamp from one of the tablet servers that hosts a replica. Every replica is then checksummed using this snapshot timestamp. 4. The original default timeout of 3600 seconds for a checksum scan is too low, but it didn't really matter because the default tablet history max age was 900 seconds. Now that checksum scans can continue for many hours, the default timeout is raised to 86400 seconds (1 day), and a new idle timeout is added. If a checksum process does not checksum an additional row for this idle timeout (default 10 minutes), it will idle time out. Note that there is a new scheduling problem given #2: each tablet server has a fixed number of slots for checksum scans, but every tablet server hosting a replica must have a slot available before any replica's checksum can start, so deciding in which order to checksum tablets and how to find which are available to schedule is important. Given that the bulk of the time in checksums is occupied waiting for tablet servers to read lots of data off disk, materialize it as rows, and checksum it, it's worth spending a lot of effort to make sure the cluster is fully utilized given the scan concurrency constraints. So, the tool uses a brute force approach and simply checks all tablets to see which can be checksummed, any time a replica checksum finishes and frees a slot. Tablets are considered in tablet id order. Since tablet ids are UUIDs, there should be no correlation between a tablet's id and how its replicas are distributed across tablet servers. There are several tests added: 1. For the KUDU-2179 fix itself. 2. For the idle timeout. 3. For when a checksum finds mismatches. Yes, we didn't have a test for this before. After adding this test I saw that the output is a little confusing since it reported the number of replicas with mismatches rather than the number of tablets, so I altered the output to fix that. 4. A couple of tests exercising situations when all tablet servers are unavailable and when all peers of a tablet are unavailable. I also checksummed a very large cluster with 500TB of data or so, across about 37000 replicas. The checksum scan completed successfully after more than 12 hours. Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_checksum.cc M src/kudu/tools/ksck_checksum.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/ksck_remote.h 7 files changed, 846 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/11554/8 -- To view, visit http://gerrit.cloudera.org:8080/11554 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476 Gerrit-Change-Number: 11554 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley