Hi Vicențiu, Vladislav and the community,

I'm ready to answer the questions asked in the last email:
 
a.     How are arrays sorted if the values inside them are 
a mix of objects, arrays, literals, numbers.
 
The order of the values in JSON array is preserved, referring to 
https://stackoverflow.com/a/7214312/547065 thanks to Vladislav Vaintroub, so 
there’s no need to sort the arrays, we just need to parse the element in it and 
sort the objects recursively.
 
b.     How do you define a sorting criteria between two 
JSON objects in this case?
 
It can be sorting the keys in ASCIIbetical order 
(https://en.wikipedia.org/wiki/ASCII#Character_order) since it’s easier to 
realize in C++ compared to other sorting criteria.
 
c.      JSON is represented as text, however one can 
use it to store floating point values. How do you plan to compare doubles and 
how would those values be sorted? For example: 1 vs 1.0 vs 1.00 or 1000 vs 1e3?
 
That’s really a problem to solve. I plan to convert every number into long 
double, and then rounded to fixed decimals (such as 8 digits after the decimal 
point?), then convert it to string again, the numbers can be unified as a 
result.
 
d.     What's the priority of null values, are they first, 
last?
 
As a. described, this question is not applicable now.
 
Here’s some test cases applying my ideas:
 
TEST CASE #1
 
'{"a": 0, "B": {"C": 1}, "D": 2}', '{"A": 7, "C": 9, "B": 8}'
 
JSON_ NORMALIZE 
 
Return 
'{"B":{"C":1.00000000},"D":2.00000000,"a":0.00000000}','{"A":7.00000000,"B":8.00000000,"C":9.00000000}'
 separately
 
JSON_EQUALS return 0
 


 
TEST CASE #2
 
'{"a": 0, "B": {"C": 100}, "D": 2}', '{"B": {"C": 1e2}, "a": 0.0, "D": 2.00}'
 
JSON_ NORMALIZE 
 
Return '{"B":{"C":100.00000000},"D":2.00000000,"a":0.00000000}', 
'{"B":{"C":100.00000000},"D":2.00000000,"a":0.00000000}' separately
 
JSON_EQUALS return 1



 
TEST CASE #3
 
'{"A": 0, "B": [{"C": 1, "E":  2}, {"A": 0, "D": 2}], "D": 2}', '{"A": 0, 
"B": [{"A": 0, "D": 2}, {"C": 1, "E":  2},], "D": 2}'
 
JSON_ NORMALIZE 
 
Return 
'{"A":0.00000000,"B":[{"C":1.00000000,"E":2.00000000},{"A":0.00000000,"D":2.00000000}],"D":2.00000000}','{"A":0.00000000,"B":[{"A":0.00000000,"D":2.00000000},{"C":1.00000000,"E":2.00000000},],"D":2.00000000}'
 separately
 
JSON_EQUALS return 0
 
 
 
TEST CASE #3
 
'{"A": 0, "B": [{"A": 0, "D": 2}, {"C": 1, "E":  2}], "D": 2}', '{"A": 0, 
"B": [{"A": 0, "D": 2}, {"C": 1, "E":  2},], "D": 2}'
 
JSON_ NORMALIZE 
 
Return 
'{"A":0.00000000,"B":[{"A":0.00000000,"D":2.00000000},{"C":1.00000000,"E":2.00000000}],"D":2.00000000}','{"A":0.00000000,"B":[{"A":0.00000000,"D":2.00000000},{"C":1.00000000,"E":2.00000000},],"D":2.00000000}'
 separately
 
JSON_EQUALS return 1
 
 
 
TEST CASE #4
 
[null,1,[2,3],true,false]', '[null,1,[2],false]'
 
JSON_ NORMALIZE 
 
Return [null,1.00000000,[2.00000000,3.00000000],true,false]', 
'[null,1.00000000,[2.00000000],false]' separately
 
JSON_EQUALS return 0
 
 
 
TEST CASE #5
 
 '{}', '{}'
 
JSON_ NORMALIZE 
 
Return '{}', '{}' separately
 
JSON_EQUALS return 1
 
 
 
 
 
TEST CASE #6
 
'[]', '[]'
 
JSON_ NORMALIZE 
 
Return '[]', '[]' separately
 
JSON_EQUALS return 1
 
In addition, I’ve also checked pandas python library 
https://github.com/pandas-dev/pandas/blob/master/pandas/io/json/_normalize.py#L249-L355
 and noticed that they use the json_normalize function to normalize 
semi-structured JSON data into a flat table. This gives me another idea; we can 
also just create the function to act like that for JSON_ NORMALIZE to generate 
a flat table (make sure the row name is in ASCIIbetical order) and to produce a 
row name vector, a column number counter and a matrix for storing the values. 
Then to JSON_EQUALS, first compare if the column number count is same, and then 
the row name vector, finally the value matrix to ensure a fast and efficient 
JSON array compare algorithm.
 
E.G.
 
JSON data:
 
[{'state': 'Florida',
 
         'shortname': 'FL',
 
         'info': {'governor': 'Rick 
Scott'},
 
         'counties': [{'name': 'Dade', 
'population': 12345},
 
                     
 {'name': 'Broward', 'population': 40000},
 
                     
 {'name': 'Palm Beach', 'population': 60000}]},
 
        {'state': 'Ohio',
 
         'shortname': 'OH',
 
         'info': {'governor': 'John 
Kasich'},
 
         'counties': [{'name': 
'Summit', 'population': 1234},
 
                     
 {'name': 'Cuyahoga', 'population': 1337}]}]
 
Table:
 
             
info.governor  name  population    state shortname 
 
    0    Rick Scott    
Dade       1      
2345   Florida    FL
 
    1    Rick Scott 
  Broward      
   40000   Florida    FL
 
    2    Rick Scott   Palm 
Beach    60000   Florida    FL
 
    3    John Kasich 
     Summit        
1234    Ohio       OH
 
    4    John Kasich    
Cuyahoga        1337   
Ohio       OH
 
Column Number Counter: 5
 
Row Name Vector: ["info.governor","name","population","state","shortname"]
 
Data Matrix:
 
Rick Scott    Dade       
1      2345   Florida    FL
 
Rick Scott   Broward      
   40000   Florida    FL
 
Rick Scott   Palm Beach    60000   
Florida    FL
 
John Kasich 
     Summit        
1234    Ohio       OH
 
John Kasich    
Cuyahoga        1337   
Ohio       OH





That's all for my ideas so far. Please correct me if I made some mistakes.


Cheers!

Songlin


 
 
------------------ Original ------------------
From: &nbsp;"Hollow&nbsp;Man"<hollow...@hollowman.ml&gt;;
Date: &nbsp;Fri, Apr 2, 2021 00:20 AM
To: &nbsp;"Vladislav Vaintroub"<vvaintr...@gmail.com&gt;; "Vicențiu 
Ciorbaru"<vicen...@mariadb.org&gt;; 
Cc: &nbsp;"maria-developers"<maria-developers@lists.launchpad.net&gt;; 
Subject: &nbsp;Re:RE: [Maria-developers] GSOC21: MDEV-16375 &amp; MDEV-23143

&nbsp;

 Hi Vicențiu and Vladislav,


Thanks for all your suggestions, I now have a more comprehensive view of the 
issues I'm going to face.


I'll start to check other databases, the pandas python library or some other 
libraries to see if there's any experience that I can learn and fulfill my 
proposal with tests and situations for corner cases that satisfy points a, b, 
c, d mentioned and other potential issues.


Songlin
&nbsp;
&nbsp;
------------------ Original ------------------
From: &nbsp;"Vladislav Vaintroub"<vvaintr...@gmail.com&gt;;
Date: &nbsp;Thu, Apr 1, 2021 04:08 PM
To: &nbsp;"Vicențiu Ciorbaru"<vicen...@mariadb.org&gt;; "Hollow 
Man"<hollow...@hollowman.ml&gt;; 
Cc: &nbsp;"maria-developers"<maria-developers@lists.launchpad.net&gt;; 
Subject: &nbsp;RE: [Maria-developers] GSOC21: MDEV-16375 &amp; MDEV-23143

&nbsp;

 
&nbsp;

&nbsp;

Vicentiu, Hollow Man,

&nbsp;

The order of the values in JSON array is preserved 

https://stackoverflow.com/a/7214312/547065 &nbsp;is a good answer about that, 
which contains the quote of the specification.

&nbsp;

So that’s not a tricky case

&nbsp;

From: Vicențiu Ciorbaru
Sent: Thursday, 1 April 2021 09:59
To: Hollow Man
Cc: maria-developers
Subject: Re: [Maria-developers] GSOC21: MDEV-16375 &amp; MDEV-23143


&nbsp;

Hi Songlin!

It's great that you are excited about this project! Here are my thoughts on 
your proposal and what I think you should focus on:

JSON_NORMALIZE seems simple at first, but I believe there are a lot of corner 
cases. In order to get a proper specification for this function can you have a 
look at other databases, to see if they implement something similar? Have a 
look at the pandas python library, can you learn from their experience?


&nbsp;


Normalizing JSON can have some tricky cases such as:


a. How are arrays sorted if the values inside them are a mix of objects, 
arrays, literals, numbers.
b. How do you define a sorting criteria between two JSON objects in this case?
c. JSON is represented as text, however one can use it to store floating point 
values. How do you plan to compare doubles and how would those values be 
sorted? For example: 1 vs 1.0 vs 1.00 or 1000 vs 1e3?
d. What's the priority of null values, are they first, last?



The way we should handle this project is via TDD (Test Driven Development). You 
would first write your test cases, covering as many corner cases as possible, 
then implement the code such that it passes all the tests.

I suggest you add to your proposal some examples of how you define 
JSON_NORMALIZE and JSON_EQUALS to behave, so that we can see you have thought 
about points a, b, c, d from above.


&nbsp;


As for JSON_EQUALS, assuming JSON_NORMALIZE is done correctly, it may work as a 
simple strcmp between two normalized JSON objects, but I am not 100% confident 
at this point, you would have to prove it :)


&nbsp;


Vicențiu



&nbsp;

On Tue, 30 Mar 2021 at 09:00, Hollow Man <hollow...@hollowman.ml&gt; wrote:


Hi community!


&nbsp;


I've had my proposal shared with 
https://drive.google.com/file/d/1sv0qbqt9W-ob3GqxygWwRGurpRS1lCiv/view&nbsp;, 
hope to get some feedback from the community.


&nbsp;


Songlin


&nbsp;


------------------ Original ------------------


From: &nbsp;"Hollow Man"<hollow...@hollowman.ml&gt;;


Date: &nbsp;Thu, Mar 11, 2021 00:17 AM


To: &nbsp;"maria-developers"<maria-developers@lists.launchpad.net&gt;; 


Subject: &nbsp;GSOC21: MDEV-16375 &amp; MDEV-23143



&nbsp;


Hi MariaDB community!


&nbsp;


&nbsp; &nbsp;Glad to be here! My github account is @HollowMan6. Though I'm new 
to MariaDB community, I'm interested in MDEV-16375 &amp; MDEV-23143: Function 
to normalize a json value &amp; missing a JSON_EQUALS function for this year's 
GSOC project. Here are my first thoughts on these issues:


&nbsp;


&nbsp; &nbsp;I have checked part of the codebase and I think the two issues can 
be merged into one. First we can create a function named JSON_NORMALIZE to 
normalize the json, which automatically parses the inputed json document, 
recursively sorts the keys (for objects) / sorts the numbers (for arrays), 
removes the spaces, and then return the json document string.


&nbsp;


&nbsp; &nbsp;Then we create a function named JSON_EQUALS, which can be used to 
compare 2 json documents for equality realized by first seperately normalize 
the two json documents using JSON_NORMALIZE, then the 2 can be compared exactly 
as binary strings.


&nbsp;


&nbsp; &nbsp;I have taken some inspirations from the Item_func_json_keys and 
json_scan_start for parsing json documents, and I think it's possible to sort 
the keys using std::map in STL for objects.


&nbsp;


&nbsp; &nbsp;That's all for my ideas so far. Please correct me if I made some 
mistakes, and I'm going to work on my ideas later.


&nbsp;


Cheers!


&nbsp;


Hollow Man






_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to&nbsp; &nbsp; &nbsp;: maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help&nbsp; &nbsp;: https://help.launchpad.net/ListHelp

&nbsp;
_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

Reply via email to