[Numpy-discussion] JSON format for multi-dimensional data
Hi community, This memo is a proposal to implement a compact and reversible (lossless round-trip) JSON interface for multi-dimensional data and in particular for Numpy (see issue #12481). The links to the documents are at the end of the memo. The JSON-NTV (Named and Typed value) format is a JSON format which integrates a notion of type. This format has also been implemented for tabular data (see NTV-pandas package available in the pandas ecosystem and the PDEP12 specification). . The use of this format has the following advantages: - Taking into account data types not known to Numpy, - Reversible format (lossless round-trip) - Interoperability with other tools for tabular or multi-dimensional data (e.g. pandas, Xarray) - Ease of sharing Json format - Binary coding possible (e.g. CBOR format) - Format integrating data of different nature The associated Jupyter Notebook presents some key points of this proposal (first draft): Summary: - introduction - benefits - multi-dimensionnal data - Multi-dimensional types - Format JSON - Using the NTV format - Equivalence of tabular format and multidimensional format - Astropy specific points - Units and quantities - Coordinates - Tables - Other structures This subject seems important to me (in particular for interoperability issues) and I would like to have your feedback before working on the implementation. Especially, - do you think this “semantic” format is interesting to use? - do you have any particular expectations or subjects that I need to study beforehand? - do you have any examples or test cases to offer me? And of course, any type of remark and comment is welcome. Thanks in advance ! links: - Jupyter notebook : https://nbviewer.org/github/loco-philippe/Environmental-Sensing/blob/main/python/Tests/numpy_tests.ipynb - JSON-NTV format : https://www.ietf.org/archive/id/draft-thomy-json-ntv-02.html - JSON-NTV overview : https://nbviewer.org/github/loco-philippe/NTV/blob/main/example/example_ntv.ipynb - NTV tabular format : https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#name-tabular-structure - NTV-pandas package : https://github.com/loco-philippe/ntv-pandas/blob/main/README.md - NTV-pandas examples : https://nbviewer.org/github/loco-philippe/ntv-pandas/blob/main/example/example_ntv_pandas.ipynb - Pandas specification - PDEP12 : https://pandas.pydata.org/pdeps/0012-compact-and-reversible-JSON-interface.html ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: JSON format for multi-dimensional data
Thank you Matti for this response. I completed issue 12481 because in my opinion the format proposal responds to this issue. However, if you think a specific issue is preferable, I can create it. To fully understand the proposed standard, it involves representing multidimensional data that has any type. The only constraint is that each data can be represented by a JSON format. This is of course the case for all pandas types but it can also be one of the following types: a year, a polygon, a URI, a type defined in darwincore or in schemaorg... This means that each library or framework must transform this JSON data into an internal value (e.g. a polygon can be translated into a shapely object). The defined types are described in the NTV Internet-Draft [2]. > - How does it handle sharing data? NumPy can handle very large ndarrays, > and a read-only container with a shared memory location, like in DLPack > [0] seems more natural than a format that precludes sharing data. Concerning the first question, the purpose of this standard is complementary to what is proposed by DLPack (DLPack offers standard access mechanisms to data in memory, which avoids duplication between frameworks): - the format is a neutral reversible exchange format built on JSON (and therefore with duplication) which can be used independently of any framework. - the data types are numerous and with a broader scope than that offered by DLPack (numeric types only). > - Is there a size limitation either on the data or on the number of > dimensions? Could this format represent, for instance, data with more > than 100 dimensions, which could not be mapped back to NumPy. Regarding the second question, no there is no limitation on data size or dimensions linked to the format (JSON does not impose limits on array sizes). > Perhaps, like the Pandas package, it should live outside NumPy for a > while until some wider consensus could emerge. Regarding this initial remark, this is indeed a possible option but it depends on the answer to the question: - does Numpy want to have a neutral JSON exchange format to exchange data with other frameworks (tabular, multidimensional or other)? This is why I am interested in having a better understanding of the needs (see end of the initial email). [2] https://www.ietf.org/archive/id/draft-thomy-json-ntv-02.html#appendix-A ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: JSON format for multi-dimensional data
Thanks Ralf, This answers my question about the absence of I/O Numpy format. There are three other points related to this format proposal: - integration of a semantic level above the number / character formats as for datetime (e.g. units, point / polygon, URI, email, IP, encoding...), - neutral format (platform independent) for multidimensional data including multi-variables, axes, indexes and metadata, - finally the conversion of tabular data into multi-dimensional data (dimension greater than 2) via a neutral format. Do these points interest Numpy or would this rather concern applications built on a Numpy base? ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: JSON format for multi-dimensional data
Bravo for this very comprehensive work which covers all technical/scientific data structures well ! I think we share the same goal of improving interoperability through the use of neutral formats. However, I see some differences: - I focus efforts more particularly on increasing the semantic level with a generalization and an extension of the notion of type, - I also try not to call into question what works well so that the impacts are minimal. For example, we can have a mixed JSON structure integrating a part of data in NTV format and another part outside it. Likewise for tabular data, we can go from a "format" type to a "semantic" type without significant impact for a tool like Pandas. More particularly, concerning multidimensional data, it seems to me that it is necessary not to limit oneself to the ndarray structure but that it is also necessary to integrate associated structures such as those defined in Xarray. ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: JSON format for multi-dimensional data
Thank you dom for this encouraging comment ! I agree with these remarks. I will indeed integrate the extensions made by scipp to Xarray. Note: I am also looking for feedback regarding the analysis of tabular structures (e.g. to identify the hidden multidimensional structure): https://github.com/loco-philippe/tab-analysis/blob/main/docs/tabular_analysis. pdf. Do you think this might be of interest to scipp or Xarray? ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: JSON format for multi-dimensional data
Hello, I created a first version of a neutral format for multi-dimensional data (https://nbviewer.org/github/loco-philippe/ntv-numpy/blob/main/example/example_ntv_numpy.ipynb ) and I made available a first version of a package (https://github.com/loco-philippe/ntv-numpy/blob/main/README.md) with: - a reversible (lossless round-trip) Xarray interface, - a reversible scipp interface - a reversible astropy.NDData interface - a reversible JSON interface The previous Notebook shows that we can, thanks to this neutral format, share any dataset with any tool. I will integrate in a second version the existing structure for tabular data (https://github.com/loco-philippe/ntv-pandas/blob/main/README.md) and the associated reversible interface . If you have examples of other tools to integrate or validation datasets, I'm interested! Have a nice day ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: numpy.org is now available in Japanese and Portuguese
Hello everybody, Will there ever be a French version? Thank you in advance for your answer and congratulations for all your work. Sincerely, Philippe Le jeu. 3 août 2023 à 15:25, Charles R Harris a écrit : > > > On Wed, Aug 2, 2023 at 9:51 PM Inessa Pawson wrote: > >> We are excited to announce that numpy.org is now available in 2 >> additional languages: Japanese and Portuguese. >> This wouldn’t be possible without our dedicated volunteers: >> >> Portuguese: >> Melissa Weber Mendonça (melissawm) >> Ricardo Prins (ricardoprins) >> Getúlio Silva (getuliosilva) >> Julio Batista Silva (jbsilva) >> Alexandre de Siqueira (alexdesiqueira) >> Alexandre B A Villares (villares) >> Vini Salazar (vinisalazar) >> >> Japanese: >> Atsushi Sakai (AtsushiSakai) >> KKunai >> Tom Kelly (TomKellyGenetics) >> Yuji Kanagawa (kngwyu) >> >> Looking ahead, we’d love to translate the website into more languages. If >> you’d like to help, please connect with the NumPy translations team on >> Slack: >> https://join.slack.com/t/numpy-team/shared_invite/zt-1gokbq56s-bvEpo10Ef7aHbVtVFeZv2w. >> (Look for the #translations channel.) >> >> We are also building a translations team who will be working on >> localizing documentation and educational content across the Scientific >> Python ecosystem. If this piqued your interest, join us on the Scientific >> Python Discord: https://discord.gg/khWtqY6RKr. (Look for the >> #translation channel.) >> >> The work on the translation infrastructure is supported, in part, with >> funding from the Chan Zuckerberg Initiative. >> >> > Nice. Thanks all. > > Chuck > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: philippe.blanchard...@gmail.com > -- Philippe BLANCHARD ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com