Lars Volker has posted comments on this change. Change subject: IMPALA-2523: Make HdfsTableSink aware of clustered input ......................................................................
Patch Set 4: (17 comments) http://gerrit.cloudera.org:8080/#/c/4863/4/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: Line 268: // Pass the row batch to the writer. If new_file is returned true then the current > I find this comment a little confusing. Maybe explain more the "why", i.e. Done. I had moved this from elsewhere, but tried to make it more clear. Line 272: do { > I feel like this might be clearer if you write it as while(true) and then b Done Line 288: // TODO: Should we adapt hdfs writer to accept start,num pair instead of a vector? > My guess is that it would only help a little bit. It's nice to use the same Ok Line 296: PartitionPair* next_partition_pair = NULL; > It would be more efficient to keep track of the previous key and comparing Done PS4, Line 301: Flush > I think it might be clearer to just say "Write rows" instead of "flush". Fl Done Line 301: // Flush previous partition > I think the comments in here are a little too low level. Here maybe a singl Done Line 310: // Collect row index > This comment isn't too helpful. Done PS4, Line 313: Flush > "Write rows to" is probably clearer than "flush". Done PS4, Line 607: WriteRowsOfPartition > WriteRowsToPartition? Done http://gerrit.cloudera.org:8080/#/c/4863/4/be/src/exec/hdfs-table-sink.h File be/src/exec/hdfs-table-sink.h: PS4, Line 183: rows > Maybe reword to be clearer, took me a while to figure out: Done PS4, Line 209: partitions > partition's (missing apostrophe). Done Line 209: /// Writes all rows of a partition to the partitions writer and clears the row index > This could be clearer that it's only writing the rows with indices in 'part Done PS4, Line 215: is expected to > This could be more declarative - i.e. "must" instead of "is expected to" Done http://gerrit.cloudera.org:8080/#/c/4863/4/tests/query_test/test_insert.py File tests/query_test/test_insert.py: PS4, Line 71: text > parquet, right? Done http://gerrit.cloudera.org:8080/#/c/4863/4/tests/query_test/test_insert_behaviour.py File tests/query_test/test_insert_behaviour.py: Line 476: > Hmm, I think we could also do with a test that writes some large partitions Done. It takes 30 seconds on my machine. Too much for core? Line 477: > Extra blank line Done Line 501: assert len(files) == 1 > Comment why we only expect one file - it might not be that obvious to peopl Done -- To view, visit http://gerrit.cloudera.org:8080/4863 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibeda0bdabbfe44c8ac95bf7c982a75649e1b82d0 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes