I think these problems can be mitigated if the CSV format is strictly defined, such as how I specified it in my previous message.
In particular, the parser has to recognize only one specific header line that has a version number somewhere, or abort - and I still insist on quoting the labels with double-quote and introducing a 3rd column with specific string or numeric types and then replacing all the special characters in the input/output with ":". Strictly defining CSV version and consequentially, the fields, and then specifying on what kind of data the import is supposed to fail at will limit the complexity of importers to N different switch cases - where N is the number of circulating versions of the format (for now 1). - Ali On Thu, Thu, 25 Aug 2022 13:48:36 +0000, rha...@protonmail.com wrote: > > Not only is JSON limited to editing only through specific software or text > > editors, but (in the latter case) it is fragile enough that a single > > missing character can cause an entire file to fail parsing. CSV is more > > forgiving in this regard. > > I think quite simply: A forgiving format is not appropriate for a standard. > > It'd be hard to understate how much extra and pointless effort it creates for > everyone, and every implementation ends up creating its own defacto standard > for what it produces and accepts. Even doing something as simple as adding an > extra column will not be possible in the future because it'll break > comparability with previous parsers. > > I've literally worked on projects where the csv parser has evolved into > scan-ahead to use heuristics to understand "rules" of a csv file, and then do > line-by-line heuristics to override those rules in pathological cases. Makes > a bit of sense when you're trying to achieve 30 years of backwards > compatibility. Doesn't make sense for much else.. > > If your application users really like csv, then introduce an > application-specific import-from-csv and export-to-csv with your own rules. > -Ryan > > ------- Original Message ------- > On Thursday, August 25th, 2022 at 1:59 AM, Craig Raw <craig...@gmail.com> > wrote: > > > Thanks for your thoughts Ryan. > > > > Without reference to the quality feedback on this proposal, I was aware > > when submitting it for review that it provides an excellent opportunity for > > bike shedding. As developers, we have all experienced frustration with data > > formats. One thing that I did not perhaps make clear enough is that this > > format is not solely intended for developers, but general users who are > > probably not well represented on this list. > > > > While doing research for this proposal I spoke to several professional > > users of Sparrow Wallet (who are not developers). They all expressed a > > desire for the format to integrate with their business processes, which are > > driven by business tools such as Excel. Labelling provides an important > > function in UTXO and address management in these scenarios, and needs to be > > accessible and manageable outside of wallet software. > > > > If this is to be achieved, it immediately rules out JSON as a data format. > > Not only is JSON limited to editing only through specific software or text > > editors, but (in the latter case) it is fragile enough that a single > > missing character can cause an entire file to fail parsing. CSV is more > > forgiving in this regard. With respect to your comments on escaping, my > > expectation would be that developers will be using a mature CSV library > > rather than handling character escaping themselves. I would rather propose > > a format that is generally usable, even if occasionally a label is escaped > > incorrectly. > > > > Finally, I'll note that CSV files are already common and uncontroversial in > > Bitcoin wallet software. Bitcoin Core, Electrum, Sparrow (and no doubt many > > others) already export addresses and/or transactions with their labels as > > CSV files. This proposal simply attempts to create a standard for importing > > and exporting all the labels in a wallet. > > > > Craig > > > > On Wed, Aug 24, 2022 at 9:01 PM <rha...@protonmail.com> wrote: > > > >> I'd strongly suggest not using CSV. Especially for a standard. I've worked > >> with it as an interchange format many a times, and it's always been a > >> clusterfuck. > >> > >> Right off the bat, you have stuff like "The fields may be quoted, but this > >> is unnecessary as the first comma in the line will always be the > >> delimiter" which invariably leads to some implementations doing it, some > >> implementations not doing it, and others that are intolerant of the other > >> way. > >> > >> And you have also made the classic mistake of not strictly defining escape > >> rules. So everyone will pick their own (e.g. some will \, escape commas, > >> others will not cause it's quoted and escape quotes, and others will > >> assume no escaping is required since its the last column in a csv). > >> > >> Over time it morphs into its own mini-monster that introduces so much pain. > >> > >> On a similar note, allowing alternatives (like: txid>index vs txid:index) > >> provides no benefit, but creates additional work for implementations (who > >> quite likely only test formats they produce) and future incompatibilities. > >> > >> I know everyone loves to hate on it, but really (line-separated?) json is > >> the way to go. > >> > >> { "tx": > >> "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?", > >> "label": "wow, such label" } > >> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", > >> "txout": 4, "label": "omg this is so easy to parse" } > >> { "tx: "c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b", > >> "txin": 0, "label": "wow this is going to be extensible as well" } > >> > >> -Ryan > >> > >> ------- Original Message ------- > >> On Wednesday, August 24th, 2022 at 2:18 AM, Craig Raw via bitcoin-dev > >> <bitcoin-dev@lists.linuxfoundation.org> wrote: > >> > >>> Hi all, > >>> > >>> I would like to propose a BIP that specifies a format for the export and > >>> import of labels from a wallet. While transferring access to funds across > >>> wallet applications has been made simple through standards such as BIP39, > >>> wallet labels remain siloed and difficult to extract despite their value, > >>> particularly in a privacy context. > >>> > >>> The proposed format is a simple two column CSV file, with the reference > >>> to a transaction, address, input or output in the first column, and the > >>> label in the second column. CSV was chosen for its wide accessibility, > >>> especially to users without specific technical expertise. Similarly, the > >>> CSV file may be compressed using the ZIP format, and optionally encrypted > >>> using AES. > >>> > >>> The full text of the BIP can be found at > >>> https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki > >>> and also copied below. > >>> > >>> Feedback is appreciated. > >>> > >>> Thanks, > >>> Craig Raw > >>> > >>> --- > >>> > >>> <pre> > >>> BIP: wallet-labels > >>> Layer: Applications > >>> Title: Wallet Labels Export Format > >>> Author: Craig Raw <cr...@sparrowwallet.com> > >>> Comments-Summary: No comments yet. > >>> Comments-URI: > >>> https://github.com/bitcoin/bips/wiki/Comments:BIP-wallet-labels > >>> Status: Draft > >>> Type: Informational > >>> Created: 2022-08-23 > >>> License: BSD-2-Clause > >>> </pre> > >>> > >>> ==Abstract== > >>> > >>> This document specifies a format for the export of labels that may be > >>> attached to the transactions, addresses, input and outputs in a wallet. > >>> > >>> ==Copyright== > >>> > >>> This BIP is licensed under the BSD 2-clause license. > >>> > >>> ==Motivation== > >>> > >>> The export and import of funds across different Bitcoin wallet > >>> applications is well defined through standards such as BIP39, BIP32, > >>> BIP44 etc. > >>> These standards are well supported and allow users to move easily between > >>> different wallets. > >>> There is, however, no defined standard to transfer any labels the user > >>> may have applied to the transactions, addresses, inputs or outputs in > >>> their wallet. > >>> The UTXO model that Bitcoin uses makes these labels particularly valuable > >>> as they may indicate the source of funds, whether received externally or > >>> as a result of change from a prior transaction. > >>> In both cases, care must be taken when spending to avoid undesirable > >>> leaks of private information. > >>> Labels provide valuable guidance in this regard, and have even become > >>> mandatory when spending in several Bitcoin wallets. > >>> Allowing users to export their labels in a standardized way ensures that > >>> they do not experience lock-in to a particular wallet application. > >>> In addition, by using common formats, this BIP seeks to make manual or > >>> bulk management of labels accessible to users without specific technical > >>> expertise. > >>> > >>> ==Specification== > >>> > >>> In order to make the import and export of labels as widely accessible as > >>> possible, this BIP uses the comma separated values (CSV) format, which is > >>> widely supported by consumer, business, and scientific applications. > >>> Although the technical specification of CSV in RFC4180 is not always > >>> followed, the application of the format in this BIP is simple enough that > >>> compatibility should not present a problem. > >>> Moreover, the simplicity and forgiving nature of CSV (over for example > >>> JSON) lends itself well to bulk label editing using spreadsheet and text > >>> editing tools. > >>> > >>> A CSV export of labels from a wallet must be a UTF-8 encoded text file, > >>> containing one record per line, with records containing two fields > >>> delimited by a comma. > >>> The fields may be quoted, but this is unnecessary, as the first comma in > >>> the line will always be the delimiter. > >>> The first line in the file is a header, and should be ignored on import. > >>> Thereafter, each line represents a record that refers to a label applied > >>> in the wallet. > >>> The order in which these records appear is not defined. > >>> > >>> The first field in the record contains a reference to the transaction, > >>> address, input or output in the wallet. > >>> This is specified as one of the following: > >>> * Transaction ID (<tt>txid</tt>) > >>> * Address > >>> * Input (rendered as <tt>txid<index</tt>) > >>> * Output (rendered as <tt>txid>index</tt> or <tt>txid:index</tt>) > >>> > >>> The second field contains the label applied to the reference. > >>> Exporting applications may omit records with no labels or labels of zero > >>> length. > >>> Files exported should use the <tt>.csv</tt> file extension. > >>> > >>> In order to reduce file size while retaining wide accessibility, the CSV > >>> file may be compressed using the ZIP file format, using the <tt>.zip</tt> > >>> file extension. > >>> This <tt>.zip</tt> file may optionally be encrypted using either AES-128 > >>> or AES-256 encryption, which is supported by numerous applications > >>> including Winzip and 7-zip. > >>> In order to ensure that weak encryption does not proliferate, importers > >>> following this standard must refuse to import <tt>.zip</tt> files > >>> encrypted with the weaker Zip 2.0 standard. > >>> The textual representation of the wallet's extended public key (as > >>> defined by BIP32, with an <tt>xpub</tt> header) should be used as the > >>> password. > >>> > >>> ==Importing== > >>> > >>> When importing, a naive algorithm may simply match against any reference, > >>> but it is possible to disambiguate between transactions, addresses, > >>> inputs and outputs. > >>> For example in the following pseudocode: > >>> <pre> > >>> if reference length < 64 > >>> Set address label > >>> else if reference length == 64 > >>> Set transaction label > >>> else if reference contains '<' > >>> Set input label > >>> else > >>> Set output label > >>> </pre> > >>> > >>> Importing applications may truncate labels if necessary. > >>> > >>> ==Test Vectors== > >>> > >>> The following fragment represents a wallet label export: > >>> <pre> > >>> Reference,Label > >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?,Transaction > >>> 1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg,Address > >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?<0,Input > >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?>0,Output > >>> c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b?:0,Output > >>> (alternative) > >>> </pre> > >>> > >>> ==Reference Implementation== > >>> > >>> TBD _______________________________________________ bitcoin-dev mailing list bitcoin-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev