Dear Vishnu, Thank you very much for sharing the document. The purpose of this HTML was two-fold: - demonstrating your skills - first step of the planning phase for the actual implementation
Below I'm providing some feedback, but I would suggest to concentrate on a simple django app at this moment and then return back to this html once you are "done" with Django to address (some of) the comments. In short: if selected, I'll insist to make this document "perfect" before proceeding (and to address all the feedback + more I didn't yet bother writing), but there's no point in asking you to spend a week making this document ten times longer and fixing tiny unimportant details that don't really demonstrate the skillset :) On 2 April 2018 at 23:32, Mojca Miklavec wrote: > V pon., 2. apr. 2018 19:49 je oseba Vishnu napisala: >> >> In the database. >> Because then it would be very easy to count the number of os for that >> port. > > I'll explain tomorrow why this is suboptimal. (But there's no need to > further optimise the database design right now.) There are probably better resources that explain this, but here's the first hit from Google: https://en.wikipedia.org/wiki/Don%27t_repeat_yourself https://en.wikipedia.org/wiki/Database_normalization In extreme case, imagine that we decide to send a questionnaire to our participants of statistics collection, asking them some 100 optional questions, including anything from gender, age, country of origin, country of current residence, education, favourite animal, ... Then we decide that we would want to compare the age distribution of users of package A vs. age distribution of users of package B. Your idea that allows "very easy number counting" would mean that: At the moment you only have (submission id, port, port version, variants) in the table. You would need to extend the table to contain (submission id, submission time, user id, port, port version, variants, os version, stdlib, xcode version, age, gender, country, education, favourite animal, ...) And if the user has 1000 ports installed, you would need to store 100x1000 cells (repeat that same information one thousand times and then again in any subsequent submission from the same user) instead of having a single copy in a separate "questionnaire" table. Multiply that with 10.000 users submitting statistics and you end up with tens of gigabytes of data each month, just to store results of that one-time questionnaire. On top of that, once the user submits a questionnaire, if you keep those answers in a separate table and use proper SQL queries, you could easily get the answer to question "what was the prevailing gender of users of package A" even for submissions that were made many months ago. If you store everything into a single monstrous table, you would either need to modify plenty of old submissions or you would not be able to get that information for old submissions at all. Additionally, it could happen that while you are updating old submissions, the database crashes. You could end up with half of the entries updated and the other half left at their old value, in inconsistent state. There are plenty of problems if you don't make sure that you keep your database design in a good shape from the very beginning. That's a super common use case in databases that has already been solved. One should use table joins and views. Random link (I'm sure there are better ones): https://db.grussell.org/sql3.html I don't know how Django handles joins and views (some hints I skimmed through are here https://stackoverflow.com/a/1281051/585897), but one should certainly make sure that the database design is done well. Learning more about that topic is part of the process. On 2 April 2018 at 23:50, Vishnu wrote: > > Please go through this https://jsfiddle.net/vishnum98/3r4vL4L3/21/ > > I did some changes. Thank you very much. The chart looks ok. For the remaining (missing) charts just add a section (and optionally an empty box) and describe what kind of chart goes there (no need for a long paragraph, just make it clear what's on the Y axis). I don't think we need a drop-down to select a version, but now that you put it there, what I think would be helpful to have there is something to switch between: - absolute number of installations in that month - number of installations of that port divided by total number of submissions in that month That is: having both absolute and relative numbers available. To make it clear: don't bother actually implementing this now. You can add a placeholder to remind you about that later (or just change the contents of that drop-down to do this instead), nothing else. We are mainly interested in the cumulative number of installations of a particular. Version does tell something, but not *that* much, except that the user did not update the ports for at least a month. We could potentially make a cumulative diagram listing all versions, random example: https://kanbanize.com/blog/wp-content/uploads/2014/01/Cumulativeflowfinal.png but I would worry about that *at the very end*. What would be a much better *global* measure would be the time since the user last updated PortIndex, but I have no clue how to get that information in a reliable way (and it's certainly not your task to worry about it). Further comments: * Some more items from the proposal are still missing, like whether the package is outdated, latest commits, link to tickets, ... No need to do anything fancy, just put some placeholder there. * Build statistics will need more work. I mean: the table as it is looks nice. But we'll probably want to represent the information in two different ways. One way listing all builds the way you did now. And the other one in approximately this way: https://trac.macports.org/ticket/55978#Viewnr.3:Overviewofhistoryofbuildsofaparticularport * I'll save more nitpicking for later :) Mojca