Re: improving access to telemetry data

2013-02-28 Thread Benjamin Smedberg

On 2/27/2013 12:52 PM, Josh Aas wrote:

The top priority for exposing telemetry data is an easy-to-use API. 
Secondarily, there should be a default front-end.
I have used the telemetry data for only very specific purposes, and in 
those cases it turns out that the existing frontend was absolutely 
worthless. I had to get metrics to implement a custom report (which 
currently runs on a cronjob) for bug 789697.


I fully support the ability for people to query the telemetry data 
directly. That said, do we know how this would actually work in 
practice? elasticsearch apparently doesn't store all the data, and hbase 
is usually a pain to query. I don't know of any existing frontend to 
make queries not suck. So basically this sounds like a proposal that 
somebody actually write a frontend.


I know that the Socorro team has already spent some significant time 
writing an elasticsearch frontend which would let us perform complicated 
queries on crash-stats, and as I understand it it's pretty generic so we 
might be able to reuse that frontend for telemetry as well. Perhaps this 
means that the telemetry elasticsearch instance should contain all of 
the telemetry data, and not just specific fields? cc'ing Laura who can 
help provide background/insight.



... It should not require people to apply for special privileges.


This may not be possible. Unless you simplify the query language so that 
you can't do very expressive things with it, it's pretty easy to write 
queries which are very cluster-intensive. I suspect that we will need to 
do something to monitor query usage;we should be able to log who is 
performing complex queries and inform them if they are affecting system 
performance.


Have we discussed this project with the metrics team yet?

--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-02-28 Thread Josh Aas
On Thursday, February 28, 2013 8:16:50 AM UTC-6, Benjamin Smedberg wrote:

> Have we discussed this project with the metrics team yet?

When I started looking into this it wasn't clear to me that people really knew 
what they wanted - they just knew why the existing system didn't work for them. 
My first goal is to understand what people want, then see what we can build. 
Your questions are good steps towards understanding the latter, which I haven't 
really started on yet. I'm curious to know the answers. I have talked to 
metrics a bit, but mostly to get basic background information on how the system 
works. I haven't asked how to do anything else in particular yet.

FYI:

Taras has done some hacking on an experimental lightweight UI, it can be found 
here:

http://people.mozilla.org/~tglek/dashboard/

It works using JSON dumps of data, like this:

http://people.mozilla.org/~tglek/dashboard/data/DNS_CLEANUP_AGE.json
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-02-28 Thread Benoit Jacob
Please, please make your plans include the ability to get raw text files
(CSV or JSON or something else, I don't care as long as I can easily parse
it). I don't have use for the current frond-end, and I believe that no
front-end will cover everyone's needs, and not every developer is familiar
with databases either (not even SQL) while almost everyone can easily do
things with raw text files.

As a second request, please make as much data as possible public.

With public data in the form of raw text files, a lot of things become
possible. Thankfully we have that for crash reports (
https://crash-analysis.mozilla.com/crash_analysis/ ) and that allows me to
make much more useful bugzilla comments than I otherwise could - because I
can link to a public CSV file and give a Bash command (with cut|grep|wc)
allowing to reproduce the result I'm claiming.

Benoit

2013/2/27 Josh Aas 

> I've been thinking about how we might improve access to telemetry data. I
> don't want to get too much into the issues with the current front-end, so
> suffice it to say that it isn't meeting my group's needs.
>
> I solicited ideas from Justin Lebar and Patrick McManus, and with those I
> came up with an overview of how those of us in platform engineering would
> ideally like to be able to access telemetry data. I'd love to hear thoughts
> and feedback.
>
> ===
>
> At the highest level, we want to make decisions, mostly regarding
> optimizations, based on feedback from "the wild." We want to know when a
> change makes something better or worse. We also want to be able to spot
> regressions.
>
> The top priority for exposing telemetry data is an easy-to-use API.
> Secondarily, there should be a default front-end.
>
> An API will allow developers to "innovate at the edges" by building tools
> to fit their needs. This will also remove the one-size-fits-all requirement
> from the default front-end. The API should be easier to use than mucking
> with hadoop/hbase directly. It might be a RESTful JSON API. It should not
> require people to apply for special privileges.
>
> The default front-end should be fast, stable, and flexible. It should be
> based as much as possible on existing products and frameworks. Being able
> to modify the display of data as needs change should be easy to do, to
> avoid long wait times for new views of the data. It should provide
> generally useful views of the data, breaking down results by build (builds
> contain dates in their IDs), so we can see how results change from one
> build to another. We want to be able to see statistical analyses such as
> standard deviations, and cumulative distribution functions.
>
> We would also like to be able to run A/B experiments. This means coming up
> with a better instrumentation framework for code in the builds, but it also
> means having a dashboard that understands the results, and can show
> comparisons between A and B users that otherwise have the same builds.
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-02-28 Thread Benjamin Smedberg

On 2/28/2013 10:33 AM, Benoit Jacob wrote:

Please, please make your plans include the ability to get raw text files
(CSV or JSON or something else, I don't care as long as I can easily parse
it).
Could you be more specific? Note that while the text files currently 
provided on crash-analysis, they are not the full dataset: they include 
a limited and specific set of fields. It looks to me like telemetry 
payloads typically include many more fields, and some of these are not 
single-value fields but rather more complex histograms and such. Putting 
all of that into text files may leave us with unworkably large text files.


Because the raw crash files do not include new metadata fields, this has 
led to weird engineering practices like shoving interesting metadata 
into the freeform app notes field, and then parsing that data back out 
later. I'm worried about perpetuating this kind of behavior, which is 
hard on the database and leads to very arcane queries in many cases.


What is the current volume of telemetry pings per day?

--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-02-28 Thread Benoit Jacob
2013/2/28 Benjamin Smedberg 

> On 2/28/2013 10:33 AM, Benoit Jacob wrote:
>
>> Please, please make your plans include the ability to get raw text files
>> (CSV or JSON or something else, I don't care as long as I can easily parse
>> it).
>>
> Could you be more specific? Note that while the text files currently
> provided on crash-analysis, they are not the full dataset: they include a
> limited and specific set of fields.


I know; but that's already plenty enough data to do many useful things.

As to being more specific, here's an example of something I'm currently
doing with CSV crash report dumps:
http://people.mozilla.org/~bjacob/gfx_features_stats/
Obviously I would be very interested in the ability to do the same with
Telemetry instead.

Another example I mentioned above is bugzilla comments getting data from
dumps; here's a link just to give an example:
https://bugzilla.mozilla.org/show_bug.cgi?id=771774#c28


> It looks to me like telemetry payloads typically include many more fields,
> and some of these are not single-value fields but rather more complex
> histograms and such. Putting all of that into text files may leave us with
> unworkably large text files.
>

Good point; so I suppose that that would support using JSON instead of CSV,
as in Josh's second email in this thread, which I hadn't seen when I wrote
this.


>
> Because the raw crash files do not include new metadata fields, this has
> led to weird engineering practices like shoving interesting metadata into
> the freeform app notes field, and then parsing that data back out later.
> I'm worried about perpetuating this kind of behavior, which is hard on the
> database and leads to very arcane queries in many cases.
>

I don't agree with the notion that freeform fields are bad. freeform plain
text is an amazing file format. It allows to add any kind of data without
administrative overhead and is still easy to parse (if the data was that
was added was formatted with easy parsing in mind).

But if one considers it a bad thing that people use it, then one should
address the issues that are causing people to use it. As you mention, raw
crash files may not include newer metadata fields. So maybe that can be
fixed by making it easier or even automatable to include new fields in raw
crash files?

Related/similar conversation in
https://bugzilla.mozilla.org/show_bug.cgi?id=641461

Benoit



>
> What is the current volume of telemetry pings per day?
>
> --BDS
>
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-02-28 Thread Jeff Muizelaar

On 2013-02-28, at 10:44 AM, Benjamin Smedberg wrote:

> On 2/28/2013 10:33 AM, Benoit Jacob wrote:
>> Please, please make your plans include the ability to get raw text files
>> (CSV or JSON or something else, I don't care as long as I can easily parse
>> it).
> Could you be more specific? Note that while the text files currently provided 
> on crash-analysis, they are not the full dataset: they include a limited and 
> specific set of fields. It looks to me like telemetry payloads typically 
> include many more fields, and some of these are not single-value fields but 
> rather more complex histograms and such. Putting all of that into text files 
> may leave us with unworkably large text files.

I've also been using these text files for gathering cpu specific information:
https://github.com/jrmuizel/cpu-features

eg:

sse2 97.5126791126%
amd 30.9852560634%
coreavg 2.29447221529
coremax 32
mulicore 81.239118672%
windowsxp 34.260938801%
fourcore 19.3838329749%

('GenuineIntel', 6, 23) 16.8679157473 Core 2 Duo 45nm
('GenuineIntel', 6, 15) 11.4360128852 Core 2 Duo Allendale/Kentsfield 65nm
('GenuineIntel', 6, 42) 9.75864128134 Core i[735] Sandybridge
('AuthenticAMD', 20, 2) 7.62036395852 AMD64 C-60
('GenuineIntel', 6, 37) 6.61528670635 Core i[735] Westmere
('GenuineIntel', 15, 4) 6.41735517496 Pentium 4 Prescott 2M 90nm
('AuthenticAMD', 20, 1) 5.85243727729 AMD64 C-50
('AuthenticAMD', 16, 6) 4.60457148945 AMD64 Athlon II
('GenuineIntel', 15, 2) 3.34941643841 Pentium 4 Northwood 130nm
('GenuineIntel', 15, 6) 2.74496830572 Pentium D
('GenuineIntel', 6, 28) 2.56862420764 Atom
('AuthenticAMD', 15, 107) 1.78671055177 AMD64 X2
('GenuineIntel', 6, 22) 1.55990232387 Core based Celeron 65nm
('GenuineIntel', 6, 58) 1.47130974042 Core i[735] Ivybridge
('GenuineIntel', 6, 14) 1.18506598185 Core Duo 65nm
('GenuineIntel', 15, 3) 1.10796800574 Pentium 4 Prescott 90nm
('AuthenticAMD', 6, 8) 0.963864879489 Athlon (Palomino) XP/Duron
('GenuineIntel', 6, 13) 0.907232911584 Pentium M
('AuthenticAMD', 16, 5) 0.876674077418 Athlon II X4
('AuthenticAMD', 17, 3) 0.81163142121 
('AuthenticAMD', 18, 1) 0.7381780767 
('AuthenticAMD', 15, 44) 0.702012116998 
('AuthenticAMD', 16, 4) 0.670892570278 Athlon II X4
('AuthenticAMD', 16, 2) 0.621830221846 Athlon II X2
('GenuineIntel', 15, 1) 0.608373120562 Pentium 4 Willamette 180nm
('GenuineIntel', 6, 30) 0.578935711502 
('AuthenticAMD', 15, 127) 0.576692861288 
('AuthenticAMD', 6, 10) 0.545012602015 Athlon MP
('AuthenticAMD', 15, 104) 0.502959160501 
('AuthenticAMD', 15, 75) 0.486698496449 
('AuthenticAMD', 15, 47) 0.463148569202 
('AuthenticAMD', 15, 67) 0.423898690456 
('AuthenticAMD', 15, 95) 0.413805864493 
('GenuineIntel', 6, 54) 0.404834463636 
('AuthenticAMD', 15, 79) 0.327456131252 
('AuthenticAMD', 21, 16) 0.294654446871 
('GenuineIntel', 6, 26) 0.278954495373 
('GenuineIntel', 6, 8) 0.252881361634 Pentium III Coppermine 0.18 um
('AuthenticAMD', 21, 1) 0.234938559922 
('AuthenticAMD', 16, 10) 0.201576162988 
('AuthenticAMD', 15, 72) 0.163167353072 
('GenuineInte', 0, 0) 0.159242365198 
('AuthenticAMD', 6, 6) 0.150270964341 Athlon XP
('AuthenticAMD', 15, 76) 0.132608518906 
('GenuineIntel', 6, 9) 0.130926381245 
('AuthenticAMD', 15, 12) 0.105974672614 
('GenuineIntel', 6, 7) 0.102610397293 Pentium III Katmai 0.25 um
('GenuineIntel', 6, 44) 0.0986854094183 
('AuthenticAMD', 15, 124) 0.0964425592042 
('AuthenticAMD', 15, 4) 0.0939193527134 
('GenuineIntel', 6, 11) 0.0894336522853 Pentium III Tualatin 0.13 um
('CentaurHauls', 6, 13) 0.0832658141967 
('AuthenticAMD', 15, 28) 0.0827051016432 
('AuthenticAMD', 15, 36) 0.0695283566356 
('AuthenticAMD', 6, 7) 0.055230186521 Duron Morgan
('AuthenticAMD', 15, 43) 0.0521462674767 
('GenuineIntel', 6, 45) 0.0428945103437 
('AuthenticAM', 0, 0) 0.0420534415135 
('GenuineIntel', 15, 0) 0.0392498787459 
('AuthenticAMD', 15, 35) 0.0361659597016 
('AuthenticAMD', 6, 4) 0.0361659597016 Athlon
('AuthenticAMD', 15, 31) 0.0299981216129 
('AuthenticAMD', 6, 3) 0.0297177653362 Athlon Duron
('AuthenticAMD', 15, 39) 0.0260731337384 
('AuthenticAMD', 21, 2) 0.0179428017124 
('AuthenticAMD', 15, 15) 0.0176624454357 
('CentaurHauls', 6, 10) 0.0159803077751 
('GenuineIntel', 6, 10) 0.0117749636238 
('AuthenticAMD', 15, 63) 0.0117749636238 
('AuthenticAMD', 15, 55) 0.011494607347 
('CentaurHauls', 6, 15) 0.01037318224 
('GenuineIntel', 6, 6) 0.00953211340972 Pentium II Mendocino 0.25 um
('CentaurHauls', 6, 9) 0.00728926319567 
('GenuineIntel', 6, 5) 0.00672855064216 Pentium II Deschutes 0.25 um
('CentaurHauls', 6, 7) 0.00532676925837 VIA Ezra/Samuel 2
('AuthenticAMD', 15, 108) 0.00364463159783 
('AuthenticAMD', 15, 33) 0.00364463159783 
('GenuineInte', 6, 15) 0.00336427532108 
('AuthenticAMD', 15, 7) 0.00336427532108 
('GenuineIntel', 6, 53) 0.00308391904432 
('AuthenticAMD', 15, 8) 0.00308391904432 
('GenuineIntel', 6, 47) 0.00280356276757 
('AuthenticAMD', 16, 8) 0.00280356276757 
('GenuineIntel', 6, 46) 0.00224285021405 
('GenuineInte', 6, 23) 0.0022428502140

Re: improving access to telemetry data

2013-02-28 Thread Benjamin Smedberg

On 2/28/2013 10:59 AM, Benoit Jacob wrote:

Because the raw crash files do not include new metadata fields, this has
led to weird engineering practices like shoving interesting metadata into
the freeform app notes field, and then parsing that data back out later.
I'm worried about perpetuating this kind of behavior, which is hard on the
database and leads to very arcane queries in many cases.


I don't agree with the notion that freeform fields are bad. freeform plain
text is an amazing file format. It allows to add any kind of data without
administrative overhead and is still easy to parse (if the data was that
was added was formatted with easy parsing in mind).
The obvious disadvantage is that it is much more difficult to 
machine-process. For example elasticsearch can't index on it (at least 
not without lots of custom parsing), and in general you can't ask tools 
like hbase or elasticsearch to filter on that without a user defined 
function. (Regexes might work for some kinds of text processing.)


But if one considers it a bad thing that people use it, then one should
address the issues that are causing people to use it. As you mention, raw
crash files may not include newer metadata fields. So maybe that can be
fixed by making it easier or even automatable to include new fields in raw
crash files?
Yes, that is all filed. We can't automatically include the field, 
because we don't know whether they are supposed to be public or private, 
but we should soon be able to have a dynamically updateable list.


Note that if mcmanus is correct, we're going to be dealing with 1M 
fields per day here. That's a lot more than the 250k from crash-stats, 
especially because the payload is bigger. I believe that the flat files 
from crash-stats are a really useful kludge because we couldn't figure 
out a better way to expose the raw data. But that kludge will start to 
fall over pretty quickly, and perhaps we should just expose a better way 
to do queries using the databases, which are surprisingly good at doing 
these kinds of queries efficiently.


--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-02-28 Thread Justin Lebar
It sounds to me like people want both

1) Easier access to aggregated data so they can build their own
dashboards roughly comparable in features to the current dashboards.

2) Easier access to raw databases so that people can build up more
complex analyses, either by exporting the raw data from the db, or by
analyzing it in the db.

That is, I don't think we can or should export JSON with all the data
in our databases.  That is a lot of data.

On Thu, Feb 28, 2013 at 12:08 PM, Benjamin Smedberg
 wrote:
> On 2/28/2013 10:59 AM, Benoit Jacob wrote:
>>>
>>> Because the raw crash files do not include new metadata fields, this has
>>> led to weird engineering practices like shoving interesting metadata into
>>> the freeform app notes field, and then parsing that data back out later.
>>> I'm worried about perpetuating this kind of behavior, which is hard on
>>> the
>>> database and leads to very arcane queries in many cases.
>>>
>> I don't agree with the notion that freeform fields are bad. freeform plain
>> text is an amazing file format. It allows to add any kind of data without
>> administrative overhead and is still easy to parse (if the data was that
>> was added was formatted with easy parsing in mind).
>
> The obvious disadvantage is that it is much more difficult to
> machine-process. For example elasticsearch can't index on it (at least not
> without lots of custom parsing), and in general you can't ask tools like
> hbase or elasticsearch to filter on that without a user defined function.
> (Regexes might work for some kinds of text processing.)
>
>>
>> But if one considers it a bad thing that people use it, then one should
>> address the issues that are causing people to use it. As you mention, raw
>> crash files may not include newer metadata fields. So maybe that can be
>> fixed by making it easier or even automatable to include new fields in raw
>> crash files?
>
> Yes, that is all filed. We can't automatically include the field, because we
> don't know whether they are supposed to be public or private, but we should
> soon be able to have a dynamically updateable list.
>
> Note that if mcmanus is correct, we're going to be dealing with 1M fields
> per day here. That's a lot more than the 250k from crash-stats, especially
> because the payload is bigger. I believe that the flat files from
> crash-stats are a really useful kludge because we couldn't figure out a
> better way to expose the raw data. But that kludge will start to fall over
> pretty quickly, and perhaps we should just expose a better way to do queries
> using the databases, which are surprisingly good at doing these kinds of
> queries efficiently.
>
>
> --BDS
>
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-02-28 Thread deinspanjer
On Thursday, February 28, 2013 12:14:52 PM UTC-5, Justin Lebar wrote:
> That is, I don't think we can or should export JSON with all the data
> in our databases.  That is a lot of data.

We currently have a little over 20 TB of compressed telemetry documents stored 
in HBase.  The compression is LZO, so probably about 60% efficiency.

The past few months, we have been averaging about 7M submissions per day, the 
current average document size is about 50KB.

Here is a recent snapshot of those stats:

https://docs.google.com/spreadsheet/ccc?key=0AtdL1GrYQUbldHVrRmtRQXZMbmVRX2hRbFRDSXJQbUE#gid=2
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-02-28 Thread Bill McCloskey
- Original Message -
> From: "Justin Lebar" 
> To: "Benjamin Smedberg" 
> Cc: "Benoit Jacob" , "Josh Aas" 
> , dev-platform@lists.mozilla.org
> Sent: Thursday, February 28, 2013 9:14:52 AM
> Subject: Re: improving access to telemetry data
> 
> It sounds to me like people want both
> 
> 1) Easier access to aggregated data so they can build their own
> dashboards roughly comparable in features to the current dashboards.
> 
> 2) Easier access to raw databases so that people can build up more
> complex analyses, either by exporting the raw data from the db, or by
> analyzing it in the db.
> 
> That is, I don't think we can or should export JSON with all the data
> in our databases.  That is a lot of data.

I've used telemetry data a little bit for finding information about addon 
usage. It took me a while to figure out how to use Pig and run Hadoop jobs, and 
it would be great to have something easier to use. Based on what little I know, 
it seems like a lot of queries fit the following scheme:

1. Filter based on version and/or buildid as well as the product 
(Firefox/TB/Fennec).
2. Select a random sample of x% of all pings.
3. Dump out the JSON and process it in Python or via some other external tool.

This, at least, was sufficient for what I was doing. It sounds like it would 
also work for many of the applications people have suggested so far as well, 
although I might be misunderstanding.

It sounds like we might be able to come up with a few generic queries that 
could run each day. One could be for Nightly data with yesterday's buildid and 
another could be for recent Aurora submissions, etc. The data would be randomly 
sampled to generate a compressed JSON file of some reasonable size (maybe 
100MB) that would then be uploaded to an FTP server that everyone could access. 
The old files would be thrown away after a few weeks, although we could archive 
a few in case someone wants older data.

I'm sure that this wouldn't cover every single use case of telemetry. However, 
it could be used both for dashboards and to get the raw data. The random 
sampling seems like the biggest potential problem. However, you could compare 
data across a few days to see how significant the results are. At the very 
least, this data would make it easy to try out prototypes. Once you find 
something that works, you could create a more customized query that would be 
more specific to the application.

-Bill
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-02-28 Thread lauraxt
On Wednesday, February 27, 2013 12:52:10 PM UTC-5, Josh Aas wrote:
> I've been thinking about how we might improve access to telemetry data. I 
> don't want to get too much into the issues with the current front-end, so 
> suffice it to say that it isn't meeting my group's needs.
> 
> 

A few people have pinged me and asked me to respond on this thread, based on 
experiences with Socorro.

I think it would be good to work out the use cases that are needed, but here 
are possible some options for opening up the data:
- Open up a reporting instance of HBase for adhoc queries (perhaps via Pig - 
this is easy to learn and working well for Socorro. There are newer options 
like Impala, too)
- Enable searches/faceting through ElasticSearch - we have some generic UI for 
this in the pipeline, which may be able to be reused.
- Create more JSON/CSV dumps and make them available.  Many of the Socorro 
reports are created based on prototypes somebody in Engineering made with CSV 
data and a script.
- Consider dumping some of the data into a relational DB. This is a common 
pattern (the so-called "data mullet") which makes querying accessible to a 
greater number of people.
- Build a simple API in front of one or more of these data sources to make it 
easier for people to write their own front ends and reporting.
- Finally, work on the front end to support the most commonly needed queries 
and reports. This could fall out of the work done on some of the other options.

Cheers

Laura
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-02-28 Thread Robert Kaiser

Benjamin Smedberg schrieb:

Because the raw crash files do not include new metadata fields, this has
led to weird engineering practices like shoving interesting metadata
into the freeform app notes field, and then parsing that data back out
later. I'm worried about perpetuating this kind of behavior, which is
hard on the database and leads to very arcane queries in many cases.


I agree and ideas on that are coming along slowly but surely, but let's 
not do the Socorro discussion here, let's stick with Telemetry in this 
thread. ;-)


I agree with bjacob that there should be a few ways of getting public 
access to data - e.g. he likes the CSV files much better than he would 
direct DB access, while I'm pretty happy with the latter on Socorro.


Robert Kaiser

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data

2013-03-06 Thread Monica Chew
> We would also like to be able to run A/B experiments.

I would like to see this in the form of percent experiments controlled by 
prefs, which are pseudo-randomly set on update. This would mean that telemetry 
would need to include dumps of relevant prefs. It would also make it easier to 
set default preferences where the choice isn't obvious, and enable a much 
slower rollout than the current release cycle.

Thanks,
Monica

> This means
> coming up with a better instrumentation framework for code in the
> builds, but it also means having a dashboard that understands the
> results, and can show comparisons between A and B users that
> otherwise have the same builds.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data(Help Wanted)

2013-02-28 Thread Taras Glek



Justin Lebar wrote:

It sounds to me like people want both

1) Easier access to aggregated data so they can build their own
dashboards roughly comparable in features to the current dashboards.


I doubt people actually want to build own dashboards. I suspect this is 
mainly a need because of deficiencies in the current dashboard.




2) Easier access to raw databases so that people can build up more
complex analyses, either by exporting the raw data from the db, or by
analyzing it in the db.

That is, I don't think we can or should export JSON with all the data
in our databases.  That is a lot of data.


From concrete examples I've seen so far, people want basic 
aggregations. My FE in http://people.mozilla.org/~tglek/dashboard/ works 
on aggregated histogram JSONs. It seems completely reasonable to 
aggregate all of the other info + simple_measurement fields(and is on my 
TODO). This would solve all of the other concrete use-cases mentioned 
(flash versions, hardware stats)


I think we can be more aggressive still. We can also allow filtering 
certain histograms by one of those highly variable info fields(eg TAB 
animations vs gfx hardware, specific chromehangs vs something useful, 
etc) without unreasonable overhead overhead.


I like my aggregated JSON approach because it's cheap on server CPU and 
as long as one partitions JSON carefully, it can be compact-enough for 
gzip encoding to make it fast-enough to download. This should also make 
it easy to fork the dashboards, contribute, etc.


I hope to feed more data into my frontend by end of today and will aim 
for a live-ish dashboard by end of next week.


For advanced use-cases, we can stick with hadoop querying.

==Help wanted==

If anyone knows a dev who is equally good at stats & programming, let me 
know. I think we have a lot of useful data, we can handle some 
visualizations of that data, but a person skilled at extracting signal 
out of noisy sources could help us squeeze the most use out of our data.



I spend too much time on management to make quick progress. I wrote up 
the prototype to prove to myself that the json schema is feasible.


If someone wants to help with aggregations, I can hook you up with raw 
json dumps from hadoop. For everything else, the code is on 
github(https://github.com/tarasglek/telemetry-frontend).
Help wanted: UX improvements such as easier-to-use selectors, 
incremental search, switching to superior charting such as flotcharts.org




On Thu, Feb 28, 2013 at 12:08 PM, Benjamin Smedberg
  wrote:

On 2/28/2013 10:59 AM, Benoit Jacob wrote:

Because the raw crash files do not include new metadata fields, this has
led to weird engineering practices like shoving interesting metadata into
the freeform app notes field, and then parsing that data back out later.
I'm worried about perpetuating this kind of behavior, which is hard on
the
database and leads to very arcane queries in many cases.


I don't agree with the notion that freeform fields are bad. freeform plain
text is an amazing file format. It allows to add any kind of data without
administrative overhead and is still easy to parse (if the data was that
was added was formatted with easy parsing in mind).

The obvious disadvantage is that it is much more difficult to
machine-process. For example elasticsearch can't index on it (at least not
without lots of custom parsing), and in general you can't ask tools like
hbase or elasticsearch to filter on that without a user defined function.
(Regexes might work for some kinds of text processing.)


But if one considers it a bad thing that people use it, then one should
address the issues that are causing people to use it. As you mention, raw
crash files may not include newer metadata fields. So maybe that can be
fixed by making it easier or even automatable to include new fields in raw
crash files?

Yes, that is all filed. We can't automatically include the field, because we
don't know whether they are supposed to be public or private, but we should
soon be able to have a dynamically updateable list.

Note that if mcmanus is correct, we're going to be dealing with 1M fields
per day here. That's a lot more than the 250k from crash-stats, especially
because the payload is bigger. I believe that the flat files from
crash-stats are a really useful kludge because we couldn't figure out a
better way to expose the raw data. But that kludge will start to fall over
pretty quickly, and perhaps we should just expose a better way to do queries
using the databases, which are surprisingly good at doing these kinds of
queries efficiently.


--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data(Help Wanted)

2013-02-28 Thread selenamarie
On Thursday, February 28, 2013 9:57:04 AM UTC-8, Taras Glek wrote:
> Justin Lebar wrote:
> 
> > It sounds to me like people want both
> 
> >
> 
> > 1) Easier access to aggregated data so they can build their own
> 
> > dashboards roughly comparable in features to the current dashboards.
> 
> 
> 
> I doubt people actually want to build own dashboards. I suspect this is 
> 
> mainly a need because of deficiencies in the current dashboard.
> 
> 
> 
> >
> 
> > 2) Easier access to raw databases so that people can build up more
> 
> > complex analyses, either by exporting the raw data from the db, or by
> 
> > analyzing it in the db.
> 
> >
> 
> > That is, I don't think we can or should export JSON with all the data
> 
> > in our databases.  That is a lot of data.
> 
> 
> 
>  From concrete examples I've seen so far, people want basic 
> 
> aggregations. My FE in http://people.mozilla.org/~tglek/dashboard/ works 
> 
> on aggregated histogram JSONs. It seems completely reasonable to 
> 
> aggregate all of the other info + simple_measurement fields(and is on my 
> 
> TODO). This would solve all of the other concrete use-cases mentioned 
> 
> (flash versions, hardware stats)
> 
> 
> 
> I think we can be more aggressive still. We can also allow filtering 
> 
> certain histograms by one of those highly variable info fields(eg TAB 
> 
> animations vs gfx hardware, specific chromehangs vs something useful, 
> 
> etc) without unreasonable overhead overhead.
> 
> 
> 
> I like my aggregated JSON approach because it's cheap on server CPU and 
> 
> as long as one partitions JSON carefully, it can be compact-enough for 
> 
> gzip encoding to make it fast-enough to download. This should also make 
> 
> it easy to fork the dashboards, contribute, etc.
> 
> 
> 
> I hope to feed more data into my frontend by end of today and will aim 
> 
> for a live-ish dashboard by end of next week.
> 
> 
> 
> For advanced use-cases, we can stick with hadoop querying.
> 
> 
> 
> ==Help wanted==
> 
> 
> 
> If anyone knows a dev who is equally good at stats & programming, let me 
> 
> know. I think we have a lot of useful data, we can handle some 
> 
> visualizations of that data, but a person skilled at extracting signal 
> 
> out of noisy sources could help us squeeze the most use out of our data.

I'm pretty interested in this problem. I won't be so bold to say that I am 
"skilled" in this area, but I have been successful in finding interesting 
things in some noisy data sets. 

So, I'm putting my hand up, and I'll see what I can do over the next few days 
to hack around at it.

If others are interested in collaboration, please just ping me. :) I'm on 
Laura's team, working primarily on Socorro.

> If someone wants to help with aggregations, I can hook you up with raw 
> json dumps from hadoop. 

I'm also interested in this, and probably more qualified to do this in the 
short term, anyway. :)

Is there a wishlist? 

-selena

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data(Help Wanted)

2013-02-28 Thread Robert Kaiser

Taras Glek schrieb:

I doubt people actually want to build own dashboards. I suspect this is
mainly a need because of deficiencies in the current dashboard.


I disagree. I think people will want to integrate Telemetry data in 
dashboards that connect data from different sources, and not just 
Telemetry. That might be combinations with FHR data, with crash data, or 
even other things.
I for example would love to have stability-related data from all those 
sources be trimmed down by a dashboard to digestible "this channel looks 
good/bad" values.


Robert Kaiser
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: improving access to telemetry data(Help Wanted)

2013-02-28 Thread Taras Glek



Robert Kaiser wrote:

Taras Glek schrieb:

I doubt people actually want to build own dashboards. I suspect this is
mainly a need because of deficiencies in the current dashboard.


I disagree. I think people will want to integrate Telemetry data in
dashboards that connect data from different sources, and not just
Telemetry. That might be combinations with FHR data, with crash data, or
even other things.
I for example would love to have stability-related data from all those
sources be trimmed down by a dashboard to digestible "this channel looks
good/bad" values.
You are correct. There is a valid use-case for integrating subsets of 
telemetry data into wikis, other dashboards, etc.


Taras


Robert Kaiser

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


What data do you want from telemetry (was Re: improving access to telemetry data)

2013-02-28 Thread Benjamin Smedberg

On 2/28/2013 10:18 AM, Josh Aas wrote:

On Thursday, February 28, 2013 8:16:50 AM UTC-6, Benjamin Smedberg wrote:


Have we discussed this project with the metrics team yet?

When I started looking into this it wasn't clear to me that people really knew 
what they wanted

Cool. Perhaps we should start out with collecting stories/examples:

* how people currently use the Telemetry UI
* data people wish they had from Telemetry but required either custom 
reports or is unavailable


I myself have the following examples:

Distribution of Flash versions: we wanted to know the distribution of 
Flash versions on various channels, limited on to Windows. This data was 
used to guide the following projects:
* encouraging users on the aurora/beta channels to install the Flash 
beta channel, but making sure that we still had enough users on the 
Flash release channel
* planning the deployment of click-to-play blocklisting and monitoring 
its effectiveness
This data is currently being collected by a custom cronjob, published as 
CSV, and is now being reported via a custom UI here: 
https://crash-analysis.mozilla.com/bsmedberg/flash-distribution.html
I'm also working on correlating this against the crash counts and ADU 
counts for each channel to give us a "crashiness of Flash versions" 
comparison metric that we can use for Flash betas.


Monitoring a specific technical detail about plugin scripting: we 
weren't sure whether plugin elements needed to support direct calling 
via NPAPI. So in bug 827158 bz is adding a telemetry probe which would 
indicate whether a plugin element was ever actually called. We intend to 
check whether there are any hits on this probe once it gets to beta/release.


I am hoping to use telemetry soon to measure the frequency where we hit 
"no active Flash instances running". If users hit 0 instances regularly, 
this will allow us to potentially restart the Flash process more often 
and work around memory leaks and other slow behavior.


I really have never used the telemetry UI frontend. I never quite 
understood how to construct interesting data from it.


--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: What data do you want from telemetry (was Re: improving access to telemetry data)

2013-02-28 Thread Patrick McManus
On Thu, Feb 28, 2013 at 10:36 AM, Benjamin Smedberg
 wrote:

> Cool. Perhaps we should start out with collecting stories/examples:
>

In that spirit:

What I almost always want to do is simply "for the last N days of
variable X show me a CDF (at even just 10 percentile granularity) for
the histogram and let me break that down by sets of build id and or
OS." That's it.

For instance - what is my median time to ready for HTTP vs HTTPs
connections (I've got data for both of those)? What about their tails?
How did they change based on some checkin I'm interested in? Not
rocket science - but incredibly painful to even approximate in the
current front end.. you can kind of do it, but with a bunch of fudging
and manual addition required and it takes forever. I'll admit I get
frustrated with all the talk of EC and Hadoop and what-not when it
really seems a rather straightforward task for me to script on the
data.

Gimme the data set and I can just script it instead of spending an
hour laboriously clicking on things and waiting 15 seconds for every
click.

Reports from the front end seem to indicate that there are 60 Million
submissions in the last month across all channels for one of the
things I'm tracking.. 651K of those from nightly. fwiw.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform