If I may, a couple of items of list-etiquette (polite behavior), as I understand them: 1 please reply to the list (cf only myself) because @Mats (who responded earlier) and others on this list are much smarter than me, and might be able to help you more quickly 2 top-posting seems to take the form 'answer, then question' which is illogical to everyone, except apparently Microsoft. It is better to have the conversation 'develop' as it proceeds - all the early information at the beginning, and the more detailed towards the 'end'. That is not to say that we can't "snip" or 'do some gardening', to remove unnecessary or erroneous material, as the conversation progresses. You will notice (as below) that this also enables a posting with multiple questions, to be discussed point-by-point.

Now to work...


> On Sun, 18 Oct 2020 at 21:48, dn via Python-list <python-list@python.org
> <mailto:python-list@python.org>> wrote:
>
>     On 19/10/2020 09:09, Shaozhong SHI wrote:
>      > Even worse is that, in some cases, an addition called
>     serviceRatings as a
>      > key occur with new data unexpectedly.
>
>     "Even worse" than what?
>
>     Do you need to keep a list of acceptable/applicable/available keys?
>     (and reject or deal with others in some alternate fashion)
>
>
> > How to produce a robust Python/Panda script to coping with all these?

...
[I often use ellipsis to indicate that I have snipped 'stuff in the middle', others are more overt and will write "<snip>" or similar]


> You may find it helpful to use the pprint ("pretty printing" library to
>     print data-structures in a more readable/structured format).
>
> To "flatten" a dictionary, you must first be sure that there will be no > keys that will clash (else the second entry will completely replace the
>     first, without trace).
>
> Thus, we will need to understand more about this particular definition > of "flatten" in relation to the range of incoming data. Perhaps explain
>     them in English first...

On 19/10/2020 12:14, Shaozhong SHI wrote:
Hi, DN,

This is the result of pprint.

[{u'overall': {u'keyQuestionRatings': [{u'name': u'Safe',
u'rating': u'Requires improvement'},
                                       {u'name': u'Well-led',
u'rating': u'Requires improvement'}],
               u'rating': u'Requires improvement'},
  u'reportDate': u'2019-10-04',
  u'reportLinkId': u'63ff05ec-4d31-406e-83de-49a271cfdc43'},
 {u'overall': {u'keyQuestionRatings': [{u'name': u'Safe',
                                        u'rating': u'Good'},
                                       {u'name': u'Well-led',
                                        u'rating': u'Good'},
                                       {u'name': u'Caring',
                                        u'rating': u'Good'},
                                       {u'name': u'Responsive',
                                        u'rating': u'Good'},
                                       {u'name': u'Effective',
u'rating': u'Requires improvement'}],
               u'rating': u'Good'},
  u'reportDate': u'2017-09-08',
  u'reportLinkId': u'4f20da40-89a4-4c45-a7f9-bfd52b48f286'},
 {u'overall': {u'keyQuestionRatings': [{u'name': u'Safe',
u'rating': u'Requires improvement'},
                                       {u'name': u'Well-led',
u'rating': u'Requires improvement'},
                                       {u'name': u'Caring',
u'rating': u'Requires improvement'},
                                       {u'name': u'Responsive',
u'rating': u'Requires improvement'},
                                       {u'name': u'Effective',
                                        u'rating': u'Good'}],
               u'rating': u'Requires improvement'},
  u'reportDate': u'2016-06-11',
  u'reportLinkId': u'0cc4226b-401e-4f0f-ba35-062cbadffa8f'},
 {u'overall': {u'keyQuestionRatings': [{u'name': u'Safe',
                                        u'rating': u'Good'},
                                       {u'name': u'Well-led',
                                        u'rating': u'Good'},
                                       {u'name': u'Caring',
                                        u'rating': u'Good'},
                                       {u'name': u'Responsive',
u'rating': u'Requires improvement'},
                                       {u'name': u'Effective',
                                        u'rating': u'Good'}],
               u'rating': u'Good'},
  u'reportDate': u'2015-01-12',
  u'reportLinkId': u'a11c1e52-ddfd-4cd8-8b56-1b96ac287c96'}]


Well done! This looks so much better, and more to the point, it is easier for 'us' to see the structure - but oh dear, doesn't email wrapping make our lives difficult!


Normally, it is like this.
But sometimes, serviceRatings is added to the key list - [u'overall', u'reportDate', u'reportLinkId']

That is what I meant about dynamically growing tree.

OK, (and only you/your user can answer this question) why do all the examples (above) not have a service-rating?

I am wondering if the use of the word "unexpectedly" has translated accurately between languages - if a data-item is part of the data-input, then our code must be able to handle it or "clean" it, as specified (by the user).

- are you able to add a service-rating to each "overall" entry?
- where service-ratings are not currently-available, would it be acceptable to add the field with a value of None? (or some other "sentinel-value" - if the analysis-phase does not consider service-ratings, can we write code to read the field from the data-source, but discard it whilst loading everything else into a Pandas matrix?


How best to handle this?

This requires understanding how the service-rating value will be used in the analysis, and thus how relevant records may be selected/ignored. Just because it features in the data, doesn't mean it needs to be included in the analysis!


Have I understood the question?
--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to