Anupama Gupta has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11263 )

Change subject: Blogpost describing index skip scan optimization.
......................................................................


Patch Set 5:

(22 comments)

Please take a look.

http://gerrit.cloudera.org:8080/#/c/11263/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11263/4//COMMIT_MSG@6
PS4, Line 6:
           : Blogpost describing index skip scan optimization.
           :
> In reviewing blogposts, it's generally helpful to post a link to a rendered
Thanks for this Andrew. I am still not sure why images are not getting rendered 
here 
-https://github.com/AnupamaGupta01/kudu/blob/blogpost-2/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md.
Although I do see the rendered version locally, using jekyll.

Please let me know if I am missing something.


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md
File _posts/2018-08-17-index-skip-scan-optimization-in-kudu.md:

http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@9
PS4, Line 9: [index skip scan][1].
> This already seems like it's going a bit too far into implementation detail
Got it. Done.


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@11
PS4, Line 11: <!--more-->
            :
> Probably don't need this.
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@14
PS4, Line 14:
> What do these do?
This is used to show the beginning excerpts of the post. I misunderstood 
earlier that it is used for newline. Moved this tag after the beginning  two 
lines and removed this from elsewhere.


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@31
PS4, Line 31:
> Maybe, add a reference (like https://en.wikipedia.org/wiki/B-tree) in-line
Makes sense. Added an in-line reference.


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@31
PS4, Line 31: 
> nit: "a B-tree", and no need to capitalize "Tree" below
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@33
PS4, Line 33:  `metric
> nit: perhaps using ``s would be more reasonable here (ie. `host`). Then it'
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@32
PS4, Line 32: In this case, by default, Kudu internally builds a primary key 
index (implemented as a
            : [B-tree](https://en.wikipedia.org/wiki/B-tree)) for the table 
`metrics`.
            : As shown in the table above, the ind
> IMO this doesn't convey the idea that the data is sorted by the composite o
Thanks for this suggestion. I moved the example dataset from below and used it 
as a reference to elaborate on the points you mentioned.


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@32
PS4, Line 32: In this case, by default, Kudu internally builds a primary key 
index (implemented as a
            : [B-tree](https://en.wikipedia.org/wiki/B-tree)) for the table 
`metrics`.
            : As shown in the table above, the ind
> +1 all points mentioned by Andrew here.
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@39
PS4, Line 39: ?
> nit: here and elsewhere, no need for spaces before punctuation marks
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@40
PS4, Line 40: the current query execution plan does not use the index. Instead, 
a full tab
> I'm not sure this gives a clear explanation as for the reason to perform a
Rephrased this paragraph to clarify this point.


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@45
PS4, Line 45: The question is,
> In general, I think the index skip scan optimization is not the only answer
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@46
PS4, Line 46:
> The crux of this is the prefixes are also sorted, and all rows of a given p
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@53
PS4, Line 53: For example, consider the query:
> nit: maybe, to be in sync with the CREATE TABLE statement above, write SQL
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@51
PS4, Line 51: and satisfying the query predicate on the `tstamp` column.
            :
            : For example, consider the query:
            : {% highlight SQL %}
            : SELECT clusterid FROM metrics WHERE tstamp = 100;
            : {% endhighlight %}
            :
> Ah, so you _do_ have an example! I think it'd be helpful setting this up up
Moved this example near the beginning.
Yes, the data is actually stored like this. This was to emphasize on how Kudu 
would store the rows due to indexing. Removed "for simplicity" now.


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@61
PS4, Line 61:  `host` = `heli
> nit: backticks here too and everywhere else that has code snippets
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@68
PS4, Line 68:
> It's not up to 13x, but it should be arbitrarily optimizable based on the c
Makes sense. Removed "up to" here.


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@69
PS4, Line 69: This optimizatio
> nit: similar to my comments in the patch, it might help defining these up f
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@73
PS4, Line 73:
            : Based on our experiments, on up to 10 million rows per tablet (as 
shown below), we found that the skip scan performa
> Yep, it would be nice to add some details around the data and reasoning bac
1) Yes, these experiments were based on table schema and query pattern 
mentioned above.
2) Added an explanation for this. We used sqrt() based on our findings from the 
performance graphs.
3) Actually, I did not experiment with using other criteria to disable skip 
scan. I do agree that it will be interesting to explore other options too.


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@73
PS4, Line 73:
            : Based on our experiments, on up to 10 million rows per tablet (as 
shown below), we found that the skip scan performa
> This could use some explanation as to why sqrt(total_num_rows) was chosen.
Added explanation about how we came to using this simple heuristic. Yes, it is 
the total number of rows in the table.


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@78
PS4, Line 78: keys exceeds ![](https://latex.codecogs.com/gif.latex
> This is for the schema and query pattern mentioned earlier, right?  Maybe,
Done


http://gerrit.cloudera.org:8080/#/c/11263/4/_posts/2018-08-17-index-skip-scan-optimization-in-kudu.md@82
PS4, Line 82: ![png]({{ site.github.url 
}}/img/index-skip-scan/skip-scan-performance-graph.png){:height="1000px" 
width="600px" .img-responsive}
            :
> I generally see the conclusion as a place to summarize, in addition to talk
Done



--
To view, visit http://gerrit.cloudera.org:8080/11263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-MessageType: comment
Gerrit-Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e
Gerrit-Change-Number: 11263
Gerrit-PatchSet: 5
Gerrit-Owner: Anupama Gupta <ag3...@columbia.edu>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Anupama Gupta <ag3...@columbia.edu>
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Comment-Date: Tue, 04 Sep 2018 02:00:32 +0000
Gerrit-HasComments: Yes

Reply via email to