Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3710: Kudu DML should ignore conflicts, pt2 ......................................................................
Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/4985/1/be/src/exec/data-sink.cc File be/src/exec/data-sink.cc: Line 119: if (!dst_stats->__isset.kudu_stats) dst_stats->__set_kudu_stats(TKuduDmlStats()); > pass 0 in c'tor of TKuduDmlStats? I can't bc there's no parameter, I think because I use optional fields. Line 161: ss << "NumRowErrors: " << stats.kudu_stats.num_row_errors << endl; > General question: Is there an easy way to distinguish the different types o It is unfortunate and probably not helpful enough, but I'm very worried about us presenting incorrect information to the user which I think is worse. The issue is that we can't trust the Kudu error codes enough. E.g. they re-use error codes for some different cases where I'm not confident enough that we'll always get it right. My goal is to work with them for the next release to get a more concrete contract. I was on the fence about having different Impala error codes at all, but I think it's worthwhile for de-duping the logs. If you think this is confusing, I'd prefer to just remove the warning logging w/ different error codes. http://gerrit.cloudera.org:8080/#/c/4985/1/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: Line 214: COUNTER_ADD(num_row_errors_, 1); > This is expensive right? How about we update a local counter and COUNTER_AD Done Line 305: if (!e.IsNotFound() && !e.IsAlreadyPresent() && status.ok()) { > Why not move to L325 into an else if? Done Line 310: // Kudu does not have yet have a way to programmatically differentiate between 'row > few words mixed up Done http://gerrit.cloudera.org:8080/#/c/4985/1/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 430: // TODO: Refactor to reflect usage by other DML statements. > Do you intend to do that soon? The structures are a little confusing now. Yeah I'll go ahead and do this if you think the strategy is otherwise OK. Line 445: // The latest observed Kudu timestamp reported by the KuduSession at this partition. > is partition == table here? Hm yeah I guess it's not great to call this a partition... Table I think might be confusing too since there are many of these inserting across the cluster for the same table. How about I just remove 'at this partition'? -- To view, visit http://gerrit.cloudera.org:8080/4985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4eb1ad91dc355ea51de261c3a14df0f9d28c879c Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-HasComments: Yes