Dear all,

First off, congratulations to everybody on creating this tool, which is going to revolutionise uploading to WikiCommons.

Inevitably what follows is going to be largely a list of nit-picks (and I'm sorry if I haven't tried to find your project plans or bug-tracker first, in case some of the answers are already in the pipeline); but don't let any of the below take away from what is a great achievement.


So, what are some issues that struck me, when uploading the set now at

https://commons.wikimedia.org/wiki/Category:Images_released_by_British_Library_Images_Online

(cat name may move, but this is where it's at for the moment).


* Filenames -- already under discussion in a different thread, at least as regards character replacements.

I was a bit surprised to find the Artwork::title field automatically being built into the file name -- I hadn't expected this.

On the one hand, I can see that it's an important piece of WikiCommons culture to enforce: the name of the work comes first, because that is what people will first see. But in my case, I sometimes had some very long titles, so I wanted to be able to sometimes have a shortened version in the filename. As a result, to avoid this I found that I was having to move put the picture title into the first line of the description field -- not ideal. So you might want to consider adding an option to de-select this.

It would be nice for users to have a bit more information about how filenames will be created, but this will come.


* Staging area. -- I had had the impression that the initial 3 test uploads would be uploaded to a staging area, rather than the main live wiki. So I was a bit surprised when I found it was indeed the main live wiki they had been uploaded to.

Of course, this makes a lot of sense -- for example, seeing the effect of specialist templates etc. It's just about managing expectations -- and, perhaps, reassuring people that mistakes can be easily removed eg by tagging the wrongly named image with {{duplicate}}. (I ended up with the unexpected title duplication causing unwanted filenames, and then a ".jpg.jpg" set of uploads). I initially wasn't very comfortable with my mistakes happening on the live wiki for all to see, which made me feel quite stressed to start with; but then I relaxed, and started the full upload.


* Output. -- If outputting {{artwork}}, please include the standard fields in the standard order, even if some of them are empty. eg:

{{Artwork
|artist            =
|title             =
|description       =
|date              =
|medium            =
|dimensions        =
|institution       =
|location          =
|references        =
|object history    =
|credit line       =
|inscriptions      =
|notes             =
|accession number  =
|source            =
|permission        =
|other_versions    =
}}

& further fields have their standard places in the order; which pretty much corresponds to the sequence they are output in, *not* alphabetical order.

This is important, because WikiCommons is not a "write once" medium -- pages are there to be easily edited and updated, by humans.

It is useful to have all the basic fields in place, even if they are not populated, because it makes it so much easier to fill something in later -- for example, in my case, to move some of the 'description' back into the 'title'; or to add references; or transcriptions of inscriptions; or other versions, already on the Wiki.

The empty fields also help to give the edit page order and structure when you look at it; otherwise it can get messy and harder to process, if the 'description' and 'source' fields are allowed to dominate, which can get quite long and free-form.

And please keep the fields in the standard order above, so that experienced editors know exactly where to expect to look for particular information, and where to edit it.


* GWtoolset fields.

The unexpected fields 'gwtoolset-title-identifier' and 'gwtoolset-url-to-the-media-file' are currently causing the template to throw warnings, which look unsightly.

If these are going to be placed in the artwork template, please edit that template, so that it doesn't throw warnings.

But is the artwork template actually the best place for these fields? They don't relate to a description of the artwork, rather a description of the upload process.

The standard place to describe the history of the upload process is in its own template, separate from the image description template -- compare for example the template left by the Flickr2Commons bot in the 'licensing' section of the page

https://commons.wikimedia.org/wiki/File:Furnival%27s_Inn,_Holborn_-_Shepherd,_1828.jpg

The advantage of this is that the 'artwork' template can be kept to a very specific function, without having its code cluttered up by other stuff. Think what the effect would be if every upload process wanted to add its fields to the artwork template -- maintenance, or even reading the code, would become a nightmare. Instead, much better to put this content in your own template, to mark the GWtoolset upload process, perhaps with an additional master parameter to turn visible output from the template off or on.


* Category section

This is one of the most important sections for hand-editing. Yes there are nice methods to add/remove categories now built right into the interface; but these still also get edited by hand, too. Readability is therefore important.

Therefore, can you add linefeed characters, so that each [[Category:...]] directive starts on a new line.

It's a small thing. But without it the output from last night's version is almost unreadable.


* Whitespace

I can see it's useful at the moment, in the present beta stage of the code, to add a debugging dump of the tool's run-state to the end of the page.

But please can you add several lines of whitespace before it.

Normally, the category section is very easy to find, being the last thing on the page. But without whitespace, it gets buried in a big heap of text. So, fine to keep the debugging information there, but please add a few lines of whitespace before it, to make it easier to find the categories section.


* Markup

I wasn't sure how to get markup onto the page. For example, the <br /> tag can be useful if one only wants a newline, not a new paragraph. (It is only double newlines that the Wiki software treats as breaks, single newlines get rendered as spaces; so a <br /> tag is needed if you want to specify a linebreak).

However it appeared that <br /> tags were being eaten by the XML parser.

I also tried double single-quotes '' to indicate italicised text, but the software carefully turned these into Unicode escapes to preserve them. (I didn't try <i> or <em>, so maybe that would have been the way round this).

It can also be very useful to be able to add [[wikilinks]] at the offline, pre-upload stage. I presume the software will escape these as well. (Though there are workaround templates, which I presume may give a way to work round this, albeit at the expense of less readable wiki-pages).


* Enhancements

** {{DEFAULTSORT:}}

It would be nice to be able to specify a field in the XML to be put into a Defaultsort for the page.

For example, for anything over 100 years old, I tend to find that it's useful to specify a default sort-key of the form "DATE ITEM SEQ" -- where DATE is a 4-digit numerical date (perhaps with a suffix to indicate imprecision), ITEM is some identifier for the series or item, eg a book, that the images are drawn from; and SEQ is a padded number to indicate a sequence within that item.

Last night I got round this by smuggling my Defaultsort into one of the fields in the Artwork template; but really it ought to be placed immediately above the Category information, so it would be able to load it there directly.


** Free text

It might be good to also be able to have the general ability to load text (eg arbitrary templates) from the XML file into the various other parts of the page outside the Artwork template. For example, particular credit templates or notes, or bespoke 'permissions' templates.

Of course it would be nice if the tool already knew about such templates; but for when it doesn't, it would be a useful option to be able to place free text in different parts of the standard page.


** Compound fields

As well as Defaultsort above, there were a number of other entries in my upload last night that were compound fields.

For example,
  Description = Title + '&#xA;' + Description
  Filename    = Short_Name - Short_Item_Name (Date), Page - Shelfmark

while 'Source' was built from two fields plus two further templates, each of which had various input fields.

Some of this is always going to be best pre-processed offline. But for simple cases, it would be nice to be able to specify multiple fields with separators, that could then be baked into the JSON file.


** Non-XML forms of input.

JSON seems increasingly popular; and might not have so many issues with escaped characters (and escapes for the escape mechanisms) as XML. Or perhaps it's just that I write simple XML by hand, but for JSON I tend to leave it to a library call to worry about...



So there are some issues. The (non-)allowed filename characters, and the presentation/layout of the final wikitext page were the ones that gave me actual unhappiness. The rest is there as a raw user's initial impressions.

But really I want to thank you for this tool, which makes batch uploading accessible really for anyone who can write an XML file, rather than having to write bespoke bots and get specific bot approval for each little thing.

Hope this is useful,

All best,

   James.


_______________________________________________
Glamtools mailing list
Glamtools@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/glamtools

Reply via email to