[Wikidata] Re: units

2023-01-24 Thread John Erling Blad
Sparql is like the secret language of the Necromongers, it is completely
incomprehensible to the uninitiated that hasn't been through the Gates of
the Underworld.

It is perhaps the single most difficult thing to grasp for users of
Wikidata.

ons. 25. jan. 2023, 00:32 skrev Marco Neumann :

> Enjoy
>
> Best,
> Marco
>
> On Tue, Jan 24, 2023 at 11:30 PM Olaf Simons <
> olaf.sim...@pierre-marteau.com> wrote:
>
>> ...alles was ich machte, war mal wieder 30 mal komplizierter,
>>
>> vielen Dank!
>> Olaf
>>
>>
>> > Marco Neumann  hat am 24.01.2023 23:48 CET
>> geschrieben:
>> >
>> >
>> > https://tinyurl.com/2nbqnavq
>> > ___
>> > Wikidata mailing list -- wikidata@lists.wikimedia.org
>> > Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/NW3SGOW6LHX3XAYMMWJZ72LDAWSJ73MU/
>> > To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>>
>> Dr. Olaf Simons
>> Forschungszentrum Gotha der Universität Erfurt
>> Am Schlossberg 2
>> 99867 Gotha
>> Büro: +49-361-737-1722
>> Mobil: +49-179-5196880
>> Privat: Hauptmarkt 17b/ 99867 Gotha
>> ___
>> Wikidata mailing list -- wikidata@lists.wikimedia.org
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/KXGWRI3N3FYLTLSOGJ3FJ2JVKWVOHV52/
>> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>>
>
>
> --
>
>
> ---
> Marco Neumann
>
>
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/BSJQRJOJBG3ON747K2GNMHFSEPILOXBH/
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/3EQ3E3FZSZYCYVEPJFVAGRQHQSGIIOYB/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: History of some original Wikidata design decisions?

2021-07-24 Thread John Erling Blad
Just to clarify, “Wikidata The Movie” was (once upon a time) a standing
joke at the original team, with wild guesses on who would play the
different characters.

But now, off to thinking Deep Thoughts.

On Sat, Jul 24, 2021 at 11:54 PM John Erling Blad  wrote:

> > A Wikidata book would be most excellent,
>
> So, what about “Wikidata – The Movie”? Who will cast Denny? Would it be
> Anthony Hopkins?
>
> John will now go to bed! (I'm not here, etc…)
>
> On Fri, Jul 23, 2021 at 8:10 PM Ed Summers  wrote:
>
>>
>> > On Thu, Jul 22, 2021 at 6:56 PM Denny Vrandečić <
>> > dvrande...@wikimedia.org> wrote:
>> > >
>> > > I hope that helps with the historical deep dive :) Lydia and I
>> > > really should write that book!
>>
>> A Wikidata book would be most excellent, especially one by both of you!
>> If there's anything interested people can do to help make it happen (a
>> little crowdfunding or what have you) please let us know.
>>
>> //Ed
>> ___
>> Wikidata mailing list -- wikidata@lists.wikimedia.org
>> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>>
>
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: History of some original Wikidata design decisions?

2021-07-24 Thread John Erling Blad
> A Wikidata book would be most excellent,

So, what about “Wikidata – The Movie”? Who will cast Denny? Would it be
Anthony Hopkins?

John will now go to bed! (I'm not here, etc…)

On Fri, Jul 23, 2021 at 8:10 PM Ed Summers  wrote:

>
> > On Thu, Jul 22, 2021 at 6:56 PM Denny Vrandečić <
> > dvrande...@wikimedia.org> wrote:
> > >
> > > I hope that helps with the historical deep dive :) Lydia and I
> > > really should write that book!
>
> A Wikidata book would be most excellent, especially one by both of you!
> If there's anything interested people can do to help make it happen (a
> little crowdfunding or what have you) please let us know.
>
> //Ed
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


Re: [Wikidata] Big numbers

2019-10-07 Thread John Erling Blad
In the DecimalMath class you escalate scaling on length of fractional part
during multiply. That is what they do in school, but it leads to false
precision. It can be argued that this is both wrong and right. Is there any
particular reason (use case) why you do that? I don't escalate scaling in
the Lua lib.

I see you have bumped into the problem whether precision is a last
digit resolution or a count of significant digits. There are a couple of
(several!) different definitions, and I'm not sure which one is right. 3 ±
0.5 meter is comparable to 120 ±10 inches, but interpretation of "3 meter"
as having a default precision of ±0.5 meter is problematic. It is easier to
see the problem if you compare with a prefix. What is the precision of 3000
meter vs 3 km? And when do you count significant digits? Is zero (0) a
significant digit?

Otherwise I find this extremely amusing. When I first mentioned this we got
into a fierce discussion, and the conclusion was that we should definitely
not use big numbers. Now we do. :D

On Mon, Oct 7, 2019 at 10:44 AM Daniel Kinzler 
wrote:

> Am 07.10.19 um 09:50 schrieb John Erling Blad:
> > Found a few references to bcmath, but some weirdness made me wonder if
> it really
> > was bcmath after all. I wonder if the weirdness is the juggling with
> double when
> > bcmath is missing.
>
> I haven't looked at the code in five years or so, but when I wrote it,
> Number
> was indeed bcmath with fallback to float. The limit of 127 characters
> sounds
> right, though I'm not sure without looking at the code.
>
> Quantity is based on Number, with quite a bit of added complexity for
> converting
> between units while considering the value's precision. e.g. "3 meters"
> should
> not turn into "118,11 inch", but "118 inch" or even "120 inch", if it's the
> default +/- 0.5 meter = 19,685 inch, which means the last digit is
> insignificant. Had lots of fun and confusion with that. I also implemented
> rounding on decimal strings for that. And initially screwed up some edge
> cases,
> which I only realized when helping my daughter with her homework ;)
>
> --
> Daniel Kinzler
> Principal Software Engineer, Core Platform
> Wikimedia Foundation
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Big numbers

2019-10-07 Thread John Erling Blad
Found a few references to bcmath, but some weirdness made me wonder if it
really was bcmath after all. I wonder if the weirdness is the juggling with
double when bcmath is missing.



On Mon, Oct 7, 2019 at 9:18 AM Jeroen De Dauw 
wrote:

> Hey John,
>
> I'm not aware of any documentation, though there probably is some
> somewhere. What I can point you to is the code dealing with numbers:
> https://github.com/wmde/Number
>
> Cheers
>
> --
> Jeroen De Dauw | www.EntropyWins.wtf  |
> www.Professional.Wiki 
> Entrepreneur | Software Crafter | Speaker | Open Souce and Wikimedia
> contributor
> ~=[,,_,,]:3
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Big numbers

2019-10-07 Thread John Erling Blad
Is there any documentation of the number format used by the quantity type?
Bumped into this and had to implement the BCmath extension to handle the
number. The reason why I did it (except it was fun) is to handle some weird
unit conversions. By inspection I found there were numbers at Wikidata that
clearly could not be implemented as doubles, and testing a little I found
that this had to be implemented as some kind of big numbers. Lua does not
have big numbers, and using the numbers from quantity as a plain number
type is a coming disaster.

So, is there any documentation for the quantity format anywhere? I have not
found anything. I posted a task about it, and please add info there if you
know where some info can be found. I suspect the format just happen to be
the same as BC, and nobody really checked if the format was compatible, or…?

The BCmath extension can be found at
- https://www.mediawiki.org/wiki/Extension:BCmath
- https://github.com/jeblad/BCmath

There is a Vagrant role if anyone likes to test it out.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

2019-01-18 Thread John Erling Blad
Tried a couple of times to rewrite this, but it grows out of bound
anyhow. Seems like it has its own life.

There is a book from 2000 by Robert Dale and Ehud Reiter; Building
natural language generation systems  ISBN 978-0-521-02451-8

Wikibase items can be rebuilt as Plans from the type statement
(top-down) or as Constituents from the other statements (bottom-up).
The two models does not necessarily agree. This is although only the
overall document structure, and organizing of the data, and it leaves
out the really hard part – the language specific realization.

You can probably redefine Plans and Constituents as entities, I have
toyed around with them as Lua classes, and put them into Wikidata. The
easiest way to reuse them locally would be to use a lookup structure
for fully or partly canned text, and define rules for agreement and
inflection as part of these texts. Piecing together canned text is
hard, but easier than building full prose from the bottom. It is
possible to define a very low-level realization for some languages,
but that is a lot harder.

The idea for lookup of canned text is to use the text that covers most
of the available statements, but still such that most of the remaining
statements can also be covered. That is some kind of canned text might
not support a specific agreement rule, thus some other canned text can
not reference it and less coverage is achieved. For example the
direction to the sea can not be expressed in a canned text for Finnish
and then the distance can not reference the direction.

To get around this I prioritized Plans and Constituents, with those
having higher priority being put first. What a person is known for
should go in front of his other work. I ordered the Plans and
Constituents chronologically to maintain causality. This can also be
called sorting. Priority tend to influence plans, and order influence
constituents. Then there are grouping, which keeps some statements
together.  Length, width, height are typically a group.

A lake can be described with individual canned text for length, width,
and height, but those are given low priority. Then it an be made a
canned text for length and height, with somewhat higher priority. An
even higher priority can be given to a canned text for all three.
Given that all three statements are available then the composite
canned text for all of them will be used. If only some of them exist
then a lower priority canned text will be used.

Note that the book use "canned text" a little different.

Also note that the canned texts can be translated as ordinary message
strings. They can also be defined as a kind of entities in Wikidata.
As ordinary message strings they need additional data, but that comes
naturally as entities in Wikidata. My drodling put it inside each
Wikipedia, as it would be easier to reuse from Lua-modules. (And yes,
you can then override part of the ArticlePlaceholder to show the text
at the special page.)

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

2019-01-14 Thread John Erling Blad
An additional note; what Wikipedia urgently needs is a way to create
and reuse canned text (aka "templates"), and a way to adapt that text
to data from Wikidata. That is mostly just inflection rules, but in
some cases it involves grammar rules. To create larger pieces of text
is much harder, especially if the text is supposed to be readable.
Jumbling sentences together as is commonly done by various botscripts
does not work very well, or rather, it does not work at all.

On Mon, Jan 14, 2019 at 11:44 AM John Erling Blad  wrote:
>
> Using an abstract language as an basis for translations have been
> tried before, and is almost as hard as translating between two common
> languages.
>
> There are two really hard problems, it is the implied references and
> the cultural context. An artificial language can get rid of the
> implied references, but it tend to create very weird and unnatural
> expressions. If the cultural context is removed, then it can be
> extremely hard to put it back in, and without any cultural context it
> can be hard to explain anything.
>
> But yes, you can make an abstract language, but it won't give you any
> high quality prose.
>
> On Mon, Jan 14, 2019 at 8:09 AM Felipe Schenone  wrote:
> >
> > This is quite an awesome idea. But thinking about it, wouldn't it be 
> > possible to use structured data in wikidata to generate articles? Can't we 
> > skip the need of learning an abstract language by using wikidata?
> >
> > Also, is there discussion about this idea anywhere in the Wikimedia wikis? 
> > I haven't found any...
> >
> > On Sat, Sep 29, 2018 at 3:44 PM Pine W  wrote:
> >>
> >> Forwarding because this (ambitious!) proposal may be of interest to people
> >> on other lists. I'm not endorsing the proposal at this time, but I'm
> >> curious about it.
> >>
> >> Pine
> >> ( https://meta.wikimedia.org/wiki/User:Pine )
> >>
> >>
> >> -- Forwarded message -
> >> From: Denny Vrandečić 
> >> Date: Sat, Sep 29, 2018 at 6:32 PM
> >> Subject: [Wikimedia-l] Wikipedia in an abstract language
> >> To: Wikimedia Mailing List 
> >>
> >>
> >> Semantic Web languages allow to express ontologies and knowledge bases in a
> >> way meant to be particularly amenable to the Web. Ontologies formalize the
> >> shared understanding of a domain. But the most expressive and widespread
> >> languages that we know of are human natural languages, and the largest
> >> knowledge base we have is the wealth of text written in human languages.
> >>
> >> We looks for a path to bridge the gap between knowledge representation
> >> languages such as OWL and human natural languages such as English. We
> >> propose a project to simultaneously expose that gap, allow to collaborate
> >> on closing it, make progress widely visible, and is highly attractive and
> >> valuable in its own right: a Wikipedia written in an abstract language to
> >> be rendered into any natural language on request. This would make current
> >> Wikipedia editors about 100x more productive, and increase the content of
> >> Wikipedia by 10x. For billions of users this will unlock knowledge they
> >> currently do not have access to.
> >>
> >> My first talk on this topic will be on October 10, 2018, 16:45-17:00, at
> >> the Asilomar in Monterey, CA during the Blue Sky track of ISWC. My second,
> >> longer talk on the topic will be at the DL workshop in Tempe, AZ, October
> >> 27-29. Comments are very welcome as I prepare the slides and the talk.
> >>
> >> Link to the paper: http://simia.net/download/abstractwikipedia.pdf
> >>
> >> Cheers,
> >> Denny
> >> ___
> >> Wikimedia-l mailing list, guidelines at:
> >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> >> https://meta.wikimedia.org/wiki/Wikimedia-l
> >> New messages to: wikimedi...@lists.wikimedia.org
> >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> >> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
> >> ___
> >> Wikipedia-l mailing list
> >> wikipedi...@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Imperative programming in Lua, do we really want it?

2017-12-07 Thread John Erling Blad
There are some really weird modules out there, I'm not sure whether it
makes a good discussion environment to point them out.

My wild guess is that the modules turn into an imperative style because the
libraries (including Wikibase), returns fragments of large table
structures. To process the fragments you then do iterations where different
subparts of those tables are extracted, keeping a state for whatever you
infer from those calls. This creates a lot of extracted states, and often
the states exists inside calls you can't test. Usually you can't even get
to those calls without breaking the interface *somehow*.

Perhaps something can be done by writing a few example pages on Mediawiki,
but my experience is that developers at Wikipedia (aka script kiddies like
me) does not check pages at Mediawiki, they just assume they do it
TheRightWay™. Writhing a set of programming manuals can thus easily become
a complete waste of time.

No, I don't have any easy solutions.



On Wed, Dec 6, 2017 at 11:53 PM, Jeroen De Dauw <jeroended...@gmail.com>
wrote:

> Hey,
>
> While I am not up to speed with the Lua surrounding Wikidata or MediaWiki,
> I support the call for avoiding overly imperative code where possible.
>
> Most Lua code I have seen in the past (which has nothing to do with
> MediaWiki) was very imperative, procedural and statefull. Those are things
> you want to avoid if you want your code to be maintainable, easy to
> understand and testable. Since Lua supports OO and functional styles, the
> language is not an excuse for throwing well establishes software
> development practices out of the window.
>
> If the code is currently procedural, I would recommend establishing that
> new code should not be procedural and have automawted tests unless there is
> very good reason to make an exception. If some of this code is written by
> people not familiar with software development, it is also important to
> create good examples for them and provide guidance so they do not
> unknowingly copy and adopt poor practices/styles.
>
> John, perhaps you can link the code that caused you to start this thread
> so that there is something more concrete to discuss?
>
> (This is just my personal opinion, not some official statement from
> Wikimedia Deutschland)
>
> PS: I just noticed this is the Wikidata mailing list and not the
> Wikidata-tech one :(
>
> Cheers
>
> --
> Jeroen De Dauw | https://entropywins.wtf | https://keybase.io/jeroendedauw
> Software craftsmanship advocate | Developer at Wikimedia Germany
> ~=[,,_,,]:3
>
> On 6 December 2017 at 23:31, John Erling Blad <jeb...@gmail.com> wrote:
>
>> With the current Lua environment we have ended up with an imperative
>> programming style in the modules. That invites to statefull objects, which
>> does not create easilly testable libraries.
>>
>> Do we have some ideas on how to avoid this, or is it simply the way
>> things are in Lua? I would really like functional programming with
>> chainable calls, but other might want something different?
>>
>> John
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Imperative programming in Lua, do we really want it?

2017-12-06 Thread John Erling Blad
With the current Lua environment we have ended up with an imperative
programming style in the modules. That invites to statefull objects, which
does not create easilly testable libraries.

Do we have some ideas on how to avoid this, or is it simply the way things
are in Lua? I would really like functional programming with chainable
calls, but other might want something different?

John
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

2017-11-30 Thread John Erling Blad
c can damage our movement beyond this
> thread or topic. Our main strength is not our content but our community,
> and I am glad to see that many have already responded to you in such a
> measured and polite way.
>
> Peace,
>
> Markus
>
>
> On 30.11.2017 09:55, John Erling Blad wrote:
> > Licensing was discussed in the start of the project, as in start of
> > developing code for the project, and as I recall it the arguments for
> > CC0 was valid and sound. That was long before Danny started working for
> > Google.
> >
> > As I recall it was mention during first week of the project (first week
> > of april), and the duscussion reemerged during first week of
> > development. That must have been week 4 or 5 (first week of may), as the
> > delivery of the laptoppen was delayed. I was against CC0 as I expected
> > problems with reuse og external data. The arguments for CC0 convinced me.
> >
> > And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and
> > Jens did too.
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

2017-11-30 Thread John Erling Blad
This was added to the wrong email, sorry for that.

On Thu, Nov 30, 2017 at 11:45 AM, Luca Martinelli <martinellil...@gmail.com>
wrote:

> Il 30 nov 2017 09:55, "John Erling Blad" <jeb...@gmail.com> ha scritto:
>
> Please keep this civil and on topic!
>
> I was just pointing out that CC0 wasn't forced down our throat by Silicon
> Valley's Fifth Column supposed embodiment, that we actually discussed
> several alternatives (ODbL included, which I saw was mentioned in the
> original message of this thread) and that that several of the objections
> made here were actually founded, as several other discussions happened
> outside this ML confirmed.
>
> I'm sorry if it appeared I wanted to start a brawl, it wasn't the case.
> For this misunderstanding, I'm sorry.
>
> L.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

2017-11-30 Thread John Erling Blad
Sorry for the sprelling errojs, my post was written on a cellphone set to
Norwegian.

On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad <jeb...@gmail.com> wrote:

> Please keep this civil and on topic!
>
> Licensing was discussed in the start of the project, as in start of
> developing code for the project, and as I recall it the arguments for CC0
> was valid and sound. That was long before Danny started working for Google.
>
> As I recall it was mention during first week of the project (first week of
> april), and the duscussion reemerged during first week of development. That
> must have been week 4 or 5 (first week of may), as the delivery of the
> laptoppen was delayed. I was against CC0 as I expected problems with reuse
> og external data. The arguments for CC0 convinced me.
>
> And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens
> did too.
>
> Argument is pretty simple: Part A has some data A and claim license A.
> Part B has some data B and claim license B. Both license A and  license B
> are sticky, this later data C that use an aggregation of A and B must
> satisfy both license A and license B. That is not viable.
>
> Moving forward to a safe, non-sticky license seems to be the only viable
> solution, and this leads to CC0.
>
> Feel free to discuss the merrit of our choice but do not use personal
> attacs. Thank you.
>
> Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli <
> martinellil...@gmail.com>:
>
>> Oh, and by the way, ODbL was considered as a potential license, but I
>> recall that that license could have been incompatible for reuse with CC
>> BY-SA 3.0. It was actually a point of discussion with the Italian
>> OpenStreetMap community back in 2013, when I first presented at the OSM-IT
>> meeting the possibility of a collaboration between WD and OSM.
>>
>> L.
>>
>> Il 30 nov 2017 08:57, "Luca Martinelli" <martinellil...@gmail.com> ha
>> scritto:
>>
>>> I basically stopped reading this email after the first attack to Denny.
>>>
>>> I was there since the beginning, and I do recall the *extensive*
>>> discussion about what license to use. CC0 was chosen, among other things,
>>> because of the moronic EU rule about database rights, that CC 3.0 licenses
>>> didn't allow us to counter - please remember that 4.0 were still under
>>> discussion, and we couldn't afford the luxury of waiting for 4.0 to come
>>> out before publishing Wikidata.
>>>
>>> And possibly next time provide a TL;DR version of your email at the top.
>>>
>>> Cheers,
>>>
>>> L.
>>>
>>>
>>> Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" <
>>> psychosl...@culture-libre.org> ha scritto:
>>>
>>>> Saluton ĉiuj,
>>>>
>>>> I forward here the message I initially posted on the Meta Tremendous
>>>> Wiktionary User Group talk page
>>>> <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>,
>>>> because I'm interested to have a wider feedback of the community on this
>>>> point. Whether you think that my view is completely misguided or that I
>>>> might have a few relevant points, I'm extremely interested to know it, so
>>>> please be bold.
>>>>
>>>> Before you consider digging further in this reading, keep in mind that
>>>> I stay convinced that Wikidata is a wonderful project and I wish it a
>>>> bright future full of even more amazing things than what it already brung
>>>> so far. My sole concern is really a license issue.
>>>>
>>>> Bellow is a copy/paste of the above linked message:
>>>>
>>>> Thank you Lydia Pintscher
>>>> <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for
>>>> taking the time to answer. Unfortunately this answer
>>>> <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0>
>>>> miss too many important points to solve all concerns which have been 
>>>> raised.
>>>>
>>>> Notably, there is still no beginning of hint in it about where the
>>>> decision of using CC0 exclusively for Wikidata came from. But as this
>>>> inquiry on the topic
>>>> <https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-0_de_Wikidata,_origine_du_choix,_enjeux,_et_prospections_sur_les_aspects_de_gouvernance_communautaire_et_d%E2%80%99%C3%A9quit%C3%A9_contributive>
>>>> advance,

Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

2017-11-30 Thread John Erling Blad
A single property licensing scheme would allow storage of data, it might or
might not allow reuse of the licensed data together with other data.
Remember that all entries in the servers might be part of an mashup with
all other entries.

On Thu, Nov 30, 2017 at 9:55 AM, John Erling Blad <jeb...@gmail.com> wrote:

> Please keep this civil and on topic!
>
> Licensing was discussed in the start of the project, as in start of
> developing code for the project, and as I recall it the arguments for CC0
> was valid and sound. That was long before Danny started working for Google.
>
> As I recall it was mention during first week of the project (first week of
> april), and the duscussion reemerged during first week of development. That
> must have been week 4 or 5 (first week of may), as the delivery of the
> laptoppen was delayed. I was against CC0 as I expected problems with reuse
> og external data. The arguments for CC0 convinced me.
>
> And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens
> did too.
>
> Argument is pretty simple: Part A has some data A and claim license A.
> Part B has some data B and claim license B. Both license A and  license B
> are sticky, this later data C that use an aggregation of A and B must
> satisfy both license A and license B. That is not viable.
>
> Moving forward to a safe, non-sticky license seems to be the only viable
> solution, and this leads to CC0.
>
> Feel free to discuss the merrit of our choice but do not use personal
> attacs. Thank you.
>
> Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli <
> martinellil...@gmail.com>:
>
>> Oh, and by the way, ODbL was considered as a potential license, but I
>> recall that that license could have been incompatible for reuse with CC
>> BY-SA 3.0. It was actually a point of discussion with the Italian
>> OpenStreetMap community back in 2013, when I first presented at the OSM-IT
>> meeting the possibility of a collaboration between WD and OSM.
>>
>> L.
>>
>> Il 30 nov 2017 08:57, "Luca Martinelli" <martinellil...@gmail.com> ha
>> scritto:
>>
>>> I basically stopped reading this email after the first attack to Denny.
>>>
>>> I was there since the beginning, and I do recall the *extensive*
>>> discussion about what license to use. CC0 was chosen, among other things,
>>> because of the moronic EU rule about database rights, that CC 3.0 licenses
>>> didn't allow us to counter - please remember that 4.0 were still under
>>> discussion, and we couldn't afford the luxury of waiting for 4.0 to come
>>> out before publishing Wikidata.
>>>
>>> And possibly next time provide a TL;DR version of your email at the top.
>>>
>>> Cheers,
>>>
>>> L.
>>>
>>>
>>> Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" <
>>> psychosl...@culture-libre.org> ha scritto:
>>>
>>>> Saluton ĉiuj,
>>>>
>>>> I forward here the message I initially posted on the Meta Tremendous
>>>> Wiktionary User Group talk page
>>>> <https://meta.wikimedia.org/wiki/Talk:Wiktionary/Tremendous_Wiktionary_User_Group#An_answer_to_Lydia_general_thinking_about_Wikidata_and_CC-0>,
>>>> because I'm interested to have a wider feedback of the community on this
>>>> point. Whether you think that my view is completely misguided or that I
>>>> might have a few relevant points, I'm extremely interested to know it, so
>>>> please be bold.
>>>>
>>>> Before you consider digging further in this reading, keep in mind that
>>>> I stay convinced that Wikidata is a wonderful project and I wish it a
>>>> bright future full of even more amazing things than what it already brung
>>>> so far. My sole concern is really a license issue.
>>>>
>>>> Bellow is a copy/paste of the above linked message:
>>>>
>>>> Thank you Lydia Pintscher
>>>> <https://meta.wikimedia.org/wiki/User:Lydia_Pintscher_%28WMDE%29> for
>>>> taking the time to answer. Unfortunately this answer
>>>> <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29/CC-0>
>>>> miss too many important points to solve all concerns which have been 
>>>> raised.
>>>>
>>>> Notably, there is still no beginning of hint in it about where the
>>>> decision of using CC0 exclusively for Wikidata came from. But as this
>>>> inquiry on the topic
>>>> <https://en.wikiversity.org/wiki/fr:Recherche:La_licence_CC-

Re: [Wikidata] An answer to Lydia Pintscher regarding its considerations on Wikidata and CC-0

2017-11-30 Thread John Erling Blad
Please keep this civil and on topic!

Licensing was discussed in the start of the project, as in start of
developing code for the project, and as I recall it the arguments for CC0
was valid and sound. That was long before Danny started working for Google.

As I recall it was mention during first week of the project (first week of
april), and the duscussion reemerged during first week of development. That
must have been week 4 or 5 (first week of may), as the delivery of the
laptoppen was delayed. I was against CC0 as I expected problems with reuse
og external data. The arguments for CC0 convinced me.

And yes, Denny argued for CC0 AS did Daniel and I believe Jeroen and Jens
did too.

Argument is pretty simple: Part A has some data A and claim license A. Part
B has some data B and claim license B. Both license A and  license B are
sticky, this later data C that use an aggregation of A and B must satisfy
both license A and license B. That is not viable.

Moving forward to a safe, non-sticky license seems to be the only viable
solution, and this leads to CC0.

Feel free to discuss the merrit of our choice but do not use personal
attacs. Thank you.

Den tor. 30. nov. 2017, 09.11 skrev Luca Martinelli <
martinellil...@gmail.com>:

> Oh, and by the way, ODbL was considered as a potential license, but I
> recall that that license could have been incompatible for reuse with CC
> BY-SA 3.0. It was actually a point of discussion with the Italian
> OpenStreetMap community back in 2013, when I first presented at the OSM-IT
> meeting the possibility of a collaboration between WD and OSM.
>
> L.
>
> Il 30 nov 2017 08:57, "Luca Martinelli"  ha
> scritto:
>
>> I basically stopped reading this email after the first attack to Denny.
>>
>> I was there since the beginning, and I do recall the *extensive*
>> discussion about what license to use. CC0 was chosen, among other things,
>> because of the moronic EU rule about database rights, that CC 3.0 licenses
>> didn't allow us to counter - please remember that 4.0 were still under
>> discussion, and we couldn't afford the luxury of waiting for 4.0 to come
>> out before publishing Wikidata.
>>
>> And possibly next time provide a TL;DR version of your email at the top.
>>
>> Cheers,
>>
>> L.
>>
>>
>> Il 29 nov 2017 22:46, "Mathieu Stumpf Guntz" <
>> psychosl...@culture-libre.org> ha scritto:
>>
>>> Saluton ĉiuj,
>>>
>>> I forward here the message I initially posted on the Meta Tremendous
>>> Wiktionary User Group talk page
>>> ,
>>> because I'm interested to have a wider feedback of the community on this
>>> point. Whether you think that my view is completely misguided or that I
>>> might have a few relevant points, I'm extremely interested to know it, so
>>> please be bold.
>>>
>>> Before you consider digging further in this reading, keep in mind that I
>>> stay convinced that Wikidata is a wonderful project and I wish it a bright
>>> future full of even more amazing things than what it already brung so far.
>>> My sole concern is really a license issue.
>>>
>>> Bellow is a copy/paste of the above linked message:
>>>
>>> Thank you Lydia Pintscher
>>>  for
>>> taking the time to answer. Unfortunately this answer
>>> 
>>> miss too many important points to solve all concerns which have been raised.
>>>
>>> Notably, there is still no beginning of hint in it about where the
>>> decision of using CC0 exclusively for Wikidata came from. But as this
>>> inquiry on the topic
>>> 
>>> advance, an answer is emerging from it. It seems that Wikidata choice
>>> toward CC0 was heavily influenced by Denny Vrandečić, who – to make it
>>> short – is now working in the Google Knowledge Graph team. Also it worth
>>> noting that Google funded a quarter of the initial development work.
>>> Another quarter came from the Gordon and Betty Moore Foundation,
>>> established by Intel co-founder. And half the money came from Microsoft
>>> co-founder Paul Allen's Institute for Artificial Intelligence (AI2)[1]
>>> .
>>> To state it shortly in a conspirational fashion, Wikidata is the puppet
>>> trojan horse of big tech hegemonic companies into the realm of Wikimedia.
>>> For a less tragic, more argumentative version, please see the research
>>> project (work in progress, only chapter 1 is in good enough shape, and it's
>>> only available in French so far). Some proofs that this claim is completely
>>> wrong are welcome, 

[Wikidata] Renaming of labels, copy to alias

2017-11-27 Thread John Erling Blad
Would it be possible to make an implicit copy of a label to an alias before
the label is changed? It would then be possible to keep on using a label as
an identifier in Lua code long as it doesn't conflicts with other
identifiers within the same item, thus lowering the maintenance load. More
important, it would lessen the number of violations during a name change.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [wikicite-discuss] Cleaning up bibliographic collections in Wikidata

2017-11-24 Thread John Erling Blad
Implicit heterogeneous unordered containers where members sees a
homogeneous parent. The member properties should be transitive to avoid the
maintenance burden, like a "tracking property", and also to make the parent
item manageable.

I can't see anything that needs any kind of special structure at the entity
level. Not even sure whether we need a new container for this, claims are
already unordered containers.

On Sat, Nov 25, 2017 at 1:25 AM, Andy Mabbett 
wrote:

> On 24 November 2017 at 23:30, Dario Taraborelli
>  wrote:
>
> > I'd like to propose a fairly simple solution and hear your feedback on
> > whether it makes sense to implement it as is or with some modifications.
> >
> > create a Wikidata class called "Wikidata item collection" [Q-X]
>
> This sounds like Wikimedia categories, as used on Wikipedia and
> Wikimedia Commons.
>
> --
> Andy Mabbett
> @pigsonthewing
> http://pigsonthewing.org.uk
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata-tech] Tabular data and its limits

2017-11-22 Thread John Erling Blad
Just checked the tabular data format as I want to use it for JSON-stat [2].
Unfortunately I find that JSON tabular data is defined such that I can't
use it for JSON-stat. Why these limits? I wonder, has those coding this
solution investigated how JSON is used for statistics at all? The way it is
now I can't even remap some (a lot) of the available datasets as they are
multidimensional, and tabular data is two-dimensional in its nature.

My guess is that tabular data is an attempt to force something into a
two-dimensional presentation even if it does not fit the problem. Can
anyone explain or clarify?

Before anyone start to argue about use of JSON-stat, I will point out that
this is in use by several statistical bureaus. The list includes Statistics
Norway, UK’s Office for National Statistics, Statistics Sweden, Statistics
Denmark, Instituto Galego de Estatística, the Central Statistics Office of
Ireland, the United Nations Economic Commission for Europe and Eurostat. In
particular Statistics Norway provide an API console where statistics can be
extracted, composed, and exported as JSON-stat.[1]

[1] http://www.ssb.no/en/omssb/tjenester-og-verktoy/api/px-api
[2] https://json-stat.org/

John Erling Blad
/jeblad
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


Re: [Wikidata] Next IRC office hour on November 14th (today!)

2017-11-15 Thread John Erling Blad
Not sure why, but I usually get these emails several days after the meeting
has been done. Lucas notice was in yesterday, the 14th, and describes a
meeting that is about to happen the 11th.  The final note in the email "See
you there in 40 minutes" is nice, but my time travel device is broken.

On Tue, Nov 14, 2017 at 8:18 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Tue, Nov 14, 2017 at 6:18 PM, Lucas Werkmeister
>  wrote:
> > Hello,
> >
> > our next Wikidata IRC office hour will take place on November 11th, 18:00
> > UTC (19:00 in Berlin), on the channel #wikimedia-office (connect).
> >
> > During one hour, you’ll be able to chat with the development team about
> the
> > past, current and future projects, and ask any question you want.
> >
> > (Sorry for the short notice! It looks like we forgot to send this email
> > earlier.)
> >
> > See you there in 40 minutes!
>
> For everyone who couldn't make it here is the log:
> https://tools.wmflabs.org/meetbot/wikimedia-office/2017/
> wikimedia-office.2017-11-14-18.00.log.html
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikimedian in Residence position at University of Virginia

2017-11-10 Thread John Erling Blad
They have some darn interesting research projects!
No, I can't apply… :(

On Fri, Nov 10, 2017 at 4:40 AM, Daniel Mietchen <
daniel.mietc...@googlemail.com> wrote:

> Dear all,
>
> I'm happy to announce that a one-year position for a Wikimedian in
> Residence is open in Charlottesville at the Data Science Institute
> (DSI) at the University of Virginia (UVA).
>
> It is aimed at fostering the interaction between the university -
> students, researchers, librarians, research administrators and others
> - and the Wikimedia communities and platforms. As such, the project
> will work across Wikimedia projects and UVA subdivisions, and
> experience in such contexts will be valued.
>
> More details about the position via
> - https://careers.insidehighered.com/job/1471604/wikimedian-in-residence/
> - http://www.my.jobs/charlottesville-va/wikimedian-in-residence/
> 29f03442637b4cc3846be0d033afb665/job/
> .
>
> For more details about the institute, see
> http://dsi.virginia.edu/ .
>
> I am working for the DSI (as a researcher) and shall be happy to
> address any questions or suggestions on the matter (including
> collaboration with other Wikimedian in Residence projects), preferably
> on-wiki or via my work email (in CC).
>
> Please feel free to pass this on to your networks.
>
> Thanks and cheers,
>
> Daniel
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Coordinate precision in Wikidata, RDF & query service

2017-11-05 Thread John Erling Blad
Not sure if I would go for it, but…

"Precision for the location of the center should be one percent of the
square root of the area covered."

Oslo covers nearly 1000 km², that would give 1 % of 32 km or 300 meter or
0.3 arc seconds.

On Mon, Nov 6, 2017 at 2:50 AM, John Erling Blad <jeb...@gmail.com> wrote:

> Glittertinden, a mountain in Norway have a geopos 61.651222 N 8.557492 E,
> alternate geopos 6835406.62, 476558.22 (EU89, UTM32).
>
> Some of the mountains are measured to within a millimeter in elevation.
> For example Ørneflag is measured to be at 1242.808 meter, with a position
> 6705530.826, 537607.272 (EU89, UTM32) alternate geopos 6717133.02, 208055.24
> (EU89, UTM33). This is on a bolt on the top of the mountain.
> There is an on-going project to map the country withing 1x1 meter and
> elevation about 0.2 meter.
>
> One arc second is about 1 km, so five digits after decimal point should be
> about 1 cm lateral precision.
>
> Goepositions isn't a fixed thing, there can be quite large tidal waves,
> and modelling and estimating them is an important research field. The waves
> can be as large as ~0.3 meter. (From long ago, ask someone working on
> this.) Estimation of where we are is to less than 1 cm, but I have heard
> better numbers.
>
> All geopos hould have a reference datum, without it it is pretty useless
> when the precision is high. An easy fix could be to use standard profiles
> with easy to recognize names, like "GPS", and limit the precision in that
> case to two digits after decimal point on an arc second.
>
> Note that precision in longitude will depend on actual latitude.
>
> On Fri, Sep 1, 2017 at 9:43 PM, Peter F. Patel-Schneider <
> pfpschnei...@gmail.com> wrote:
>
>> The GPS unit on my boat regularly claims an estimated position error of 4
>> feet after it has acquired its full complement of satellites.  This is a
>> fairly new mid-price GPS unit using up to nine satellites and WAAS.  So my
>> recreational GPS supposedly obtains fifth-decimal-place accuracy.  It was
>> running under an unobstructed sky, which is common when boating.  Careful
>> use of a good GPS unit should be able to achieve this level of accuracy on
>> land as well.
>>
>> From http://www.gps.gov/systems/gps/performance/accuracy/ the raw
>> accuracy
>> of the positioning information from a satellite is less than 2.4 feet 95%
>> of
>> the time.  The accuracy reported by a GPS unit is degraded by atmospheric
>> conditions; false signals, e.g., bounces; and the need to determine
>> position
>> by intersecting the raw data from several satellites.  Accuracy can be
>> improved by using more satellites and multiple frequencies and by
>> comparing to a signal from a receiver at a known location.
>>
>> The web page above claims that accuracy can be improved to a few
>> centimeters
>> in real time and down to the millimeter level if a device is left in the
>> same place for a long period of time.  I think that these last two
>> accuracies require a close-by receiver at a known location and correspond
>> to what is said in [4].
>>
>> peter
>>
>>
>>
>> On 08/30/2017 06:53 PM, Nick Wilson (Quiddity) wrote:
>> > On Tue, Aug 29, 2017 at 2:13 PM, Stas Malyshev <smalys...@wikimedia.org>
>> wrote:
>> >> [...] Would four decimals
>> >> after the dot be enough? According to [4] this is what commercial GPS
>> >> device can provide. If not, why and which accuracy would be
>> appropriate?
>> >>
>> >
>> > I think that should be 5 decimals for commercial GPS, per that link?
>> > It also suggests that "The sixth decimal place is worth up to 0.11 m:
>> > you can use this for laying out structures in detail, for designing
>> > landscapes, building roads. It should be more than good enough for
>> > tracking movements of glaciers and rivers. This can be achieved by
>> > taking painstaking measures with GPS, such as differentially corrected
>> > GPS."
>> >
>> > Do we hope to store datasets around glacier movement? It seems
>> > possible. (We don't seem to currently
>> > https://www.wikidata.org/wiki/Q770424 )
>> >
>> > I skimmed a few search results, and found 7 (or 15) decimals given in
>> > one standard, but the details are beyond my understanding:
>> > http://resources.esri.com/help/9.3/arcgisengine/java/gp_tool
>> ref/geoprocessing_environments/about_coverage_precision.htm
>> > https://stackoverflow.com/questions/1947481/how-many-signifi
>> cant-digits-should-i-store-in-my-database-f

Re: [Wikidata] Coordinate precision in Wikidata, RDF & query service

2017-11-05 Thread John Erling Blad
Glittertinden, a mountain in Norway have a geopos 61.651222 N 8.557492 E,
alternate geopos 6835406.62, 476558.22 (EU89, UTM32).

Some of the mountains are measured to within a millimeter in elevation. For
example Ørneflag is measured to be at 1242.808 meter, with a position
6705530.826, 537607.272 (EU89, UTM32) alternate geopos 6717133.02, 208055.24
(EU89, UTM33). This is on a bolt on the top of the mountain.
There is an on-going project to map the country withing 1x1 meter and
elevation about 0.2 meter.

One arc second is about 1 km, so five digits after decimal point should be
about 1 cm lateral precision.

Goepositions isn't a fixed thing, there can be quite large tidal waves, and
modelling and estimating them is an important research field. The waves can
be as large as ~0.3 meter. (From long ago, ask someone working on this.)
Estimation of where we are is to less than 1 cm, but I have heard better
numbers.

All geopos hould have a reference datum, without it it is pretty useless
when the precision is high. An easy fix could be to use standard profiles
with easy to recognize names, like "GPS", and limit the precision in that
case to two digits after decimal point on an arc second.

Note that precision in longitude will depend on actual latitude.

On Fri, Sep 1, 2017 at 9:43 PM, Peter F. Patel-Schneider <
pfpschnei...@gmail.com> wrote:

> The GPS unit on my boat regularly claims an estimated position error of 4
> feet after it has acquired its full complement of satellites.  This is a
> fairly new mid-price GPS unit using up to nine satellites and WAAS.  So my
> recreational GPS supposedly obtains fifth-decimal-place accuracy.  It was
> running under an unobstructed sky, which is common when boating.  Careful
> use of a good GPS unit should be able to achieve this level of accuracy on
> land as well.
>
> From http://www.gps.gov/systems/gps/performance/accuracy/ the raw accuracy
> of the positioning information from a satellite is less than 2.4 feet 95%
> of
> the time.  The accuracy reported by a GPS unit is degraded by atmospheric
> conditions; false signals, e.g., bounces; and the need to determine
> position
> by intersecting the raw data from several satellites.  Accuracy can be
> improved by using more satellites and multiple frequencies and by
> comparing to a signal from a receiver at a known location.
>
> The web page above claims that accuracy can be improved to a few
> centimeters
> in real time and down to the millimeter level if a device is left in the
> same place for a long period of time.  I think that these last two
> accuracies require a close-by receiver at a known location and correspond
> to what is said in [4].
>
> peter
>
>
>
> On 08/30/2017 06:53 PM, Nick Wilson (Quiddity) wrote:
> > On Tue, Aug 29, 2017 at 2:13 PM, Stas Malyshev 
> wrote:
> >> [...] Would four decimals
> >> after the dot be enough? According to [4] this is what commercial GPS
> >> device can provide. If not, why and which accuracy would be appropriate?
> >>
> >
> > I think that should be 5 decimals for commercial GPS, per that link?
> > It also suggests that "The sixth decimal place is worth up to 0.11 m:
> > you can use this for laying out structures in detail, for designing
> > landscapes, building roads. It should be more than good enough for
> > tracking movements of glaciers and rivers. This can be achieved by
> > taking painstaking measures with GPS, such as differentially corrected
> > GPS."
> >
> > Do we hope to store datasets around glacier movement? It seems
> > possible. (We don't seem to currently
> > https://www.wikidata.org/wiki/Q770424 )
> >
> > I skimmed a few search results, and found 7 (or 15) decimals given in
> > one standard, but the details are beyond my understanding:
> > http://resources.esri.com/help/9.3/arcgisengine/java/gp_
> toolref/geoprocessing_environments/about_coverage_precision.htm
> > https://stackoverflow.com/questions/1947481/how-many-
> significant-digits-should-i-store-in-my-database-for-a-gps-coordinate
> > https://stackoverflow.com/questions/7167604/how-
> accurately-should-i-store-latitude-and-longitude
> >
> >> [4]
> >> https://gis.stackexchange.com/questions/8650/measuring-
> accuracy-of-latitude-and-longitude
> >
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Encoders/feature extractors for neural nets

2017-10-02 Thread John Erling Blad
You might view (my) problem as an embedding for words (and its fragments)
driven by valued statements (those you discard), and then inverting this
(learned encoder) into a language model. Thus when describing an object it
would be possible to chose better words (lexical choice in natural language
generation).

On Mon, Oct 2, 2017 at 5:00 PM, <f...@imm.dtu.dk> wrote:

> I have done some work on converting Wikidata items and properties to a
> low-dimensional representation (graph embedding).
>
> A webservice with a "most-similar" functionality based on computation in
> the low-dimensional space is running from https://tools.wmflabs.org/wemb
> edder/most-similar/
>
> A query may look like:
>
> https://tools.wmflabs.org/wembedder/most-similar/Q20#language=en
>
> It is based on a simple Gensim model https://github.com/fnielsen/wembedder
> and could probably be improved.
>
> It is described in http://www2.imm.dtu.dk/pubdb/v
> iews/edoc_download.php/7011/pdf/imm7011.pdf
>
> It is not embedding statements but rather individual items.
>
>
> There is general research on graph embedding. I have added some of the
> scientific articles to Wikidata. You can see them with Scholia:
>
> https://tools.wmflabs.org/scholia/topic/Q32081746
>
>
> best regards
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
>
> On 09/27/2017 02:14 PM, John Erling Blad wrote:
>
>> The most important thing for my problem would be to encode quantity and
>> geopos. The test case is lake sizes to encode proper localized descriptions.
>>
>> Unless someone already have a working solution I would encode this as
>> sparse logarithmic vectors, probably also with log of pairwise differences.
>>
>> Encoding of qualifiers is interesting, but would require encoding of a
>> topic map, and that adds an additional layer of complexity.
>>
>> How to encode the values are not so much the problem, but avoiding
>> reimplementing this yet another time… ;)
>>
>> On Wed, Sep 27, 2017 at 1:23 PM, Thomas Pellissier Tanon <
>> tho...@pellissier-tanon.fr <mailto:tho...@pellissier-tanon.fr>> wrote:
>>
>> Just an idea of a very sparse but hopefully not so bad encoding (I
>> have not actually tested it).
>>
>> NB: I am going to use a lot the terms defined in the glossary [1].
>>
>> A value could be encoded by a vector:
>> - for entity ids it is a vector V that have the dimension of the
>> number of existing entities such that V[q] = 1 if, and only if, it
>> is the entity q and V[q] = 0 if not.
>> - for time : a vector with year, month, day, hours, minutes,
>> seconds, is_precision_year, is_precision_month, ..., is_gregorian,
>> is_julian (or something similar)
>> - for geo coordinates latitude, longitude, is_earth, is_moon...
>> - string/language strings: an encoding depending on your use case
>> ...
>> Example : To encode "Q2" you would have the vector {0,1,0}
>> To encode the year 2000 you would have {2000,0...,
>> is_precision_decade =
>> 0,is_precision_year=1,is_precision_month=0,...,is_gregorian=true,...}
>>
>> To encode a snak you build a big vector by concatenating the vector
>> of the value if it is P1, if it is P2... (you use the property
>> datatype to pick a good vector shape) + you add two cells per
>> property to encode is_novalue, is_somevalue. To encode "P31: Q5" you
>> would have a vector V = {0,,0,0,0,0,1,0,} with 1 only for
>>  V[P31_offset + Q5_offset]
>>
>> To encode a claim you could concatenate the main snak vector + the
>> qualifiers vectors that is the merge of the snak vector for all
>> qualifiers (i.e. you build the vector for all snak and you sum them)
>> such that the qualifier vectors encode all qualifiers at the same
>> time. it allows to check that a qualifiers is set just by picking
>> the right cell in the vector. But it will do bad things if there are
>> two qualifiers with the same property and having a datatype like
>> time or geocoordinates. But I don't think it really a problem.
>> Example: to encode the claim with "P31: Q5" main snak and qualifiers
>> "P42: Q42, P42: Q44" we would have a vector V such that V[P31_offset
>> + Q5_offset] = 1, V[qualifiers_offset + P42_offset + Q42_offset] = 1
>> and V[qualifiers_offset + P42_offset + Q44_offset] = 1 and 0
>> elsewhere.
>>
>> I am not sure how to encode statements references (merge all of them
>> and encode it just like th

Re: [Wikidata] Encoders/feature extractors for neural nets

2017-10-02 Thread John Erling Blad
A "watercourse" is more related to the countryside than the ocean, even if
we tend to associate it with its destination.

Anyway, from the paper "One should not expect Wembedder to perform at the
state of the art level, and a comparison with the Wordsim-353 dataset for
semantic relatedness evaluation shows poor performance with Pearson and
Spearman correlations on just 0.13."

It is interesting anyhow.

On Tue, Oct 3, 2017 at 3:03 AM, Thad Guidry  wrote:

> Similar how ?
> The training seems to have made
> ​a ​
> few wrong assumptions
> ​, but I might be wrong since I don't know your assumptions while
> training.​
>
> ​​
> https://tools.wmflabs.org/wembedder/most-similar/Q355304#language=en
>
> You miss "ocean" on this, and pickup "farmhouse", for instance
> ​ ?
>
> Do Wikipedia Categories​ or 'subclass of' affect anything here ?
> ​
>
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Encoders/feature extractors for neural nets

2017-09-27 Thread John Erling Blad
The most important thing for my problem would be to encode quantity and
geopos. The test case is lake sizes to encode proper localized descriptions.

Unless someone already have a working solution I would encode this as
sparse logarithmic vectors, probably also with log of pairwise differences.

Encoding of qualifiers is interesting, but would require encoding of a
topic map, and that adds an additional layer of complexity.

How to encode the values are not so much the problem, but avoiding
reimplementing this yet another time… ;)

On Wed, Sep 27, 2017 at 1:23 PM, Thomas Pellissier Tanon <
tho...@pellissier-tanon.fr> wrote:

> Just an idea of a very sparse but hopefully not so bad encoding (I have
> not actually tested it).
>
> NB: I am going to use a lot the terms defined in the glossary [1].
>
> A value could be encoded by a vector:
> - for entity ids it is a vector V that have the dimension of the number of
> existing entities such that V[q] = 1 if, and only if, it is the entity q
> and V[q] = 0 if not.
> - for time : a vector with year, month, day, hours, minutes, seconds,
> is_precision_year, is_precision_month, ..., is_gregorian, is_julian (or
> something similar)
> - for geo coordinates latitude, longitude, is_earth, is_moon...
> - string/language strings: an encoding depending on your use case
> ...
> Example : To encode "Q2" you would have the vector {0,1,0}
> To encode the year 2000 you would have {2000,0..., is_precision_decade =
> 0,is_precision_year=1,is_precision_month=0,...,is_gregorian=true,...}
>
> To encode a snak you build a big vector by concatenating the vector of the
> value if it is P1, if it is P2... (you use the property datatype to pick a
> good vector shape) + you add two cells per property to encode is_novalue,
> is_somevalue. To encode "P31: Q5" you would have a vector V =
> {0,,0,0,0,0,1,0,} with 1 only for  V[P31_offset + Q5_offset]
>
> To encode a claim you could concatenate the main snak vector + the
> qualifiers vectors that is the merge of the snak vector for all qualifiers
> (i.e. you build the vector for all snak and you sum them) such that the
> qualifier vectors encode all qualifiers at the same time. it allows to
> check that a qualifiers is set just by picking the right cell in the
> vector. But it will do bad things if there are two qualifiers with the same
> property and having a datatype like time or geocoordinates. But I don't
> think it really a problem.
> Example: to encode the claim with "P31: Q5" main snak and qualifiers "P42:
> Q42, P42: Q44" we would have a vector V such that V[P31_offset + Q5_offset]
> = 1, V[qualifiers_offset + P42_offset + Q42_offset] = 1 and
> V[qualifiers_offset + P42_offset + Q44_offset] = 1 and 0 elsewhere.
>
> I am not sure how to encode statements references (merge all of them and
> encode it just like the qualifiers vector is maybe a first step but is bad
> if we have multiple references).  For the rank you just need 3 booleans
> is_preferred, is_normal and is_deprecated.
>
> Cheers,
>
> Thomas
>
> [1] https://www.wikidata.org/wiki/Wikidata:Glossary
>
>
> > Le 27 sept. 2017 à 12:41, John Erling Blad <jeb...@gmail.com> a écrit :
> >
> > Is there anyone that has done any work on how to encode statements as
> features for neural nets? I'm mostly interested in sparse encoders for
> online training of live networks.
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Encoders/feature extractors for neural nets

2017-09-27 Thread John Erling Blad
Is there anyone that has done any work on how to encode statements as
features for neural nets? I'm mostly interested in sparse encoders for
online training of live networks.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Significant change: new data type for tabular data files

2017-05-02 Thread John Erling Blad
The API console for extraction of statistics is separate from use of
JSONstat, but it can export JSONstat. Statistics Norway have a user manual
for how to use the console.[2][1] This is a common collaboration between
several Scandinavian census bureaus.

[1]
http://ssb.no/en/omssb/tjenester-og-verktoy/api/_attachment/248250?_ts=15b48207778
[2] http://ssb.no/en/omssb/tjenester-og-verktoy/api

On Tue, May 2, 2017 at 4:00 PM, John Erling Blad <jeb...@gmail.com> wrote:

> One of the most important use of this is probably JSONstat, but note that
> it is _not_ obvious which categories maps to which items, or that a
> category maps to the same item at all. For example "Oslo" might be used as
> the name of the city in Norway in some contexts, while it might be the name
> of the county in other contexts, or it might be the unincorporated
> community in Southern Florida.
>
> If a tag function is made to make and reapply queries to the API-console
> now in use at several census bureaus, then the JSONstat for specific data
> can be automatically updated. This will create some additional problems, as
> those stats can't be manually updated.
>
> Yes I have done some experiments on this, no it has not been possible to
> get this up and running for various reasons. (There must be a working cache
> with high availability, the json-lib in Scribunto is flaky, etc.)
>
> On Tue, May 2, 2017 at 3:47 PM, John Erling Blad <jeb...@gmail.com> wrote:
>
>> You know that this has pretty huge implications for the data model, and
>> that data stored in a tabular file might invalidate the statement where it
>> is referenced? And both the statement and the data file might be valid in
>> isolation? (It is two valid propositions but from different worlds.)
>>
>> On Tue, May 2, 2017 at 12:33 PM, Jane Darnell <jane...@gmail.com> wrote:
>>
>>> Interesting, thanks! I have been waiting for more developments on this
>>> since it was shown by User:TheDJ at the developer's showcase in january
>>> (link here at 5 minutes in https://www.youtube.com/watch?v=j2pR21imm9A)
>>> I was wondering if this could be used in the case of painting items
>>> being linked to old art sale catalogs. So instead of bothering with
>>> wikisource, no matter what language the catalog is in I could link to a
>>> catalog entry on commons by line and column (theoretically two columns: one
>>> column for catalog identifier, and second columns for full catalog entry,
>>> generally less than 300 characters of text).
>>>
>>> On Tue, May 2, 2017 at 10:37 AM, Léa Lacroix <lea.lacr...@wikimedia.de>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> We’ve been working on a new data type that allows you to link to the 
>>>> *tabular
>>>> data files <https://www.mediawiki.org/wiki/Help:Tabular_Data>* that
>>>> are now stored on Commons. This data type will be deployed on Wikidata on 
>>>> *May
>>>> 15th*.
>>>>
>>>> The property creators will be able to create properties with this
>>>> tabular data type by selecting “tabular data” in the data type list.
>>>>
>>>> When the property is created, you can use it in statements, and when
>>>> filling the value, if you start typing a string, you can choose the name of
>>>> a file in the list of what exists on Commons.
>>>>
>>>> Before the deployment, you can test it on http://test.wikidata.org (
>>>> example <https://test.wikidata.org/wiki/Q59992>).
>>>>
>>>> One thing to note: We currently do not export statements that use this
>>>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>>>> Service. The reason is that we are still waiting for tabular data files to
>>>> get stable URIs. This is handled in this ticket
>>>> <https://phabricator.wikimedia.org/T161527>.
>>>> If you have any question, feel free to ask!
>>>>
>>>> --
>>>> Léa Lacroix
>>>> Project Manager Community Communication for Wikidata
>>>>
>>>> Wikimedia Deutschland e.V.
>>>> Tempelhofer Ufer 23-24
>>>> 10963 Berlin
>>>> www.wikimedia.de
>>>>
>>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>>
>>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>>>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>>
>>>> ___
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Significant change: new data type for tabular data files

2017-05-02 Thread John Erling Blad
A statement in a world might be valid and true in that world, but false in
another world. Two statements, each in its own world might use a name
(parameter or instantiated) that is similar by accident, and chained
together they will form an invalid statement.

If values from the data files are available in the same RDF without
preventive measures (aka same world) it will create problems. This is well
known in logic, so it should not be a surprise. There are several proposed
solutions for RDF-graphs, but it gets slightly (a lot) more complex.

On Tue, May 2, 2017 at 3:56 PM, Jane Darnell <jane...@gmail.com> wrote:

> Not sure what you mean - if the datafile stored on commons links in turn
> to the source (e.g. the sale catalog hosted somewhere) then the datafile
> only acts as a transformation engine enabling blobbish text to be accessed
> as citable material for machine reading bots. Seems nifty to me.
>
> On Tue, May 2, 2017 at 3:47 PM, John Erling Blad <jeb...@gmail.com> wrote:
>
>> You know that this has pretty huge implications for the data model, and
>> that data stored in a tabular file might invalidate the statement where it
>> is referenced? And both the statement and the data file might be valid in
>> isolation? (It is two valid propositions but from different worlds.)
>>
>> On Tue, May 2, 2017 at 12:33 PM, Jane Darnell <jane...@gmail.com> wrote:
>>
>>> Interesting, thanks! I have been waiting for more developments on this
>>> since it was shown by User:TheDJ at the developer's showcase in january
>>> (link here at 5 minutes in https://www.youtube.com/watch?v=j2pR21imm9A)
>>> I was wondering if this could be used in the case of painting items
>>> being linked to old art sale catalogs. So instead of bothering with
>>> wikisource, no matter what language the catalog is in I could link to a
>>> catalog entry on commons by line and column (theoretically two columns: one
>>> column for catalog identifier, and second columns for full catalog entry,
>>> generally less than 300 characters of text).
>>>
>>> On Tue, May 2, 2017 at 10:37 AM, Léa Lacroix <lea.lacr...@wikimedia.de>
>>> wrote:
>>>
>>>> Hello all,
>>>>
>>>> We’ve been working on a new data type that allows you to link to the 
>>>> *tabular
>>>> data files <https://www.mediawiki.org/wiki/Help:Tabular_Data>* that
>>>> are now stored on Commons. This data type will be deployed on Wikidata on 
>>>> *May
>>>> 15th*.
>>>>
>>>> The property creators will be able to create properties with this
>>>> tabular data type by selecting “tabular data” in the data type list.
>>>>
>>>> When the property is created, you can use it in statements, and when
>>>> filling the value, if you start typing a string, you can choose the name of
>>>> a file in the list of what exists on Commons.
>>>>
>>>> Before the deployment, you can test it on http://test.wikidata.org (
>>>> example <https://test.wikidata.org/wiki/Q59992>).
>>>>
>>>> One thing to note: We currently do not export statements that use this
>>>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>>>> Service. The reason is that we are still waiting for tabular data files to
>>>> get stable URIs. This is handled in this ticket
>>>> <https://phabricator.wikimedia.org/T161527>.
>>>> If you have any question, feel free to ask!
>>>>
>>>> --
>>>> Léa Lacroix
>>>> Project Manager Community Communication for Wikidata
>>>>
>>>> Wikimedia Deutschland e.V.
>>>> Tempelhofer Ufer 23-24
>>>> 10963 Berlin
>>>> www.wikimedia.de
>>>>
>>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>>
>>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>>>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>>
>>>> ___
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Significant change: new data type for tabular data files

2017-05-02 Thread John Erling Blad
One of the most important use of this is probably JSONstat, but note that
it is _not_ obvious which categories maps to which items, or that a
category maps to the same item at all. For example "Oslo" might be used as
the name of the city in Norway in some contexts, while it might be the name
of the county in other contexts, or it might be the unincorporated
community in Southern Florida.

If a tag function is made to make and reapply queries to the API-console
now in use at several census bureaus, then the JSONstat for specific data
can be automatically updated. This will create some additional problems, as
those stats can't be manually updated.

Yes I have done some experiments on this, no it has not been possible to
get this up and running for various reasons. (There must be a working cache
with high availability, the json-lib in Scribunto is flaky, etc.)

On Tue, May 2, 2017 at 3:47 PM, John Erling Blad <jeb...@gmail.com> wrote:

> You know that this has pretty huge implications for the data model, and
> that data stored in a tabular file might invalidate the statement where it
> is referenced? And both the statement and the data file might be valid in
> isolation? (It is two valid propositions but from different worlds.)
>
> On Tue, May 2, 2017 at 12:33 PM, Jane Darnell <jane...@gmail.com> wrote:
>
>> Interesting, thanks! I have been waiting for more developments on this
>> since it was shown by User:TheDJ at the developer's showcase in january
>> (link here at 5 minutes in https://www.youtube.com/watch?v=j2pR21imm9A)
>> I was wondering if this could be used in the case of painting items being
>> linked to old art sale catalogs. So instead of bothering with wikisource,
>> no matter what language the catalog is in I could link to a catalog entry
>> on commons by line and column (theoretically two columns: one column for
>> catalog identifier, and second columns for full catalog entry, generally
>> less than 300 characters of text).
>>
>> On Tue, May 2, 2017 at 10:37 AM, Léa Lacroix <lea.lacr...@wikimedia.de>
>> wrote:
>>
>>> Hello all,
>>>
>>> We’ve been working on a new data type that allows you to link to the 
>>> *tabular
>>> data files <https://www.mediawiki.org/wiki/Help:Tabular_Data>* that are
>>> now stored on Commons. This data type will be deployed on Wikidata on *May
>>> 15th*.
>>>
>>> The property creators will be able to create properties with this
>>> tabular data type by selecting “tabular data” in the data type list.
>>>
>>> When the property is created, you can use it in statements, and when
>>> filling the value, if you start typing a string, you can choose the name of
>>> a file in the list of what exists on Commons.
>>>
>>> Before the deployment, you can test it on http://test.wikidata.org (
>>> example <https://test.wikidata.org/wiki/Q59992>).
>>>
>>> One thing to note: We currently do not export statements that use this
>>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>>> Service. The reason is that we are still waiting for tabular data files to
>>> get stable URIs. This is handled in this ticket
>>> <https://phabricator.wikimedia.org/T161527>.
>>> If you have any question, feel free to ask!
>>>
>>> --
>>> Léa Lacroix
>>> Project Manager Community Communication for Wikidata
>>>
>>> Wikimedia Deutschland e.V.
>>> Tempelhofer Ufer 23-24
>>> 10963 Berlin
>>> www.wikimedia.de
>>>
>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>
>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Significant change: new data type for tabular data files

2017-05-02 Thread John Erling Blad
You know that this has pretty huge implications for the data model, and
that data stored in a tabular file might invalidate the statement where it
is referenced? And both the statement and the data file might be valid in
isolation? (It is two valid propositions but from different worlds.)

On Tue, May 2, 2017 at 12:33 PM, Jane Darnell  wrote:

> Interesting, thanks! I have been waiting for more developments on this
> since it was shown by User:TheDJ at the developer's showcase in january
> (link here at 5 minutes in https://www.youtube.com/watch?v=j2pR21imm9A)
> I was wondering if this could be used in the case of painting items being
> linked to old art sale catalogs. So instead of bothering with wikisource,
> no matter what language the catalog is in I could link to a catalog entry
> on commons by line and column (theoretically two columns: one column for
> catalog identifier, and second columns for full catalog entry, generally
> less than 300 characters of text).
>
> On Tue, May 2, 2017 at 10:37 AM, Léa Lacroix 
> wrote:
>
>> Hello all,
>>
>> We’ve been working on a new data type that allows you to link to the *tabular
>> data files * that are
>> now stored on Commons. This data type will be deployed on Wikidata on *May
>> 15th*.
>>
>> The property creators will be able to create properties with this tabular
>> data type by selecting “tabular data” in the data type list.
>>
>> When the property is created, you can use it in statements, and when
>> filling the value, if you start typing a string, you can choose the name of
>> a file in the list of what exists on Commons.
>>
>> Before the deployment, you can test it on http://test.wikidata.org (
>> example ).
>>
>> One thing to note: We currently do not export statements that use this
>> datatype to RDF. They can therefore not be queried in the Wikidata Query
>> Service. The reason is that we are still waiting for tabular data files to
>> get stable URIs. This is handled in this ticket
>> .
>> If you have any question, feel free to ask!
>>
>> --
>> Léa Lacroix
>> Project Manager Community Communication for Wikidata
>>
>> Wikimedia Deutschland e.V.
>> Tempelhofer Ufer 23-24
>> 10963 Berlin
>> www.wikimedia.de
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>
>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wikidata-tech] Script and API module for constraint checks

2017-04-27 Thread John Erling Blad
Good idea, but I wonder if this exposes users for the very alien language
in the constraint reports.
It is often very difficult to understand what the error is about, not to
mention how to fix it.

On Fri, Apr 28, 2017 at 12:28 AM, Stas Malyshev 
wrote:

> Hi!
>
> > That’s a bug in the constraints on Wikidata – “date of birth” has a
> > constraint stating that its value must be at least 30 years away from
> > the “date of birth” value. We’ll work on resolving this (I contacted
> > Ivan Krestinin, who added this “experimental constraint”, to ask if he
> > still needs it – and if it can’t be removed from the P569 talk page for
> > some reason, we’ll probably filter it out in the user script).
>
> The checks seem to be against P184/P185, of which neither is on
> Q5066005. So I suspect there's some bug still somewhere.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [Wiki-research-l] The basis for Wikidata quality

2017-03-22 Thread John Erling Blad
Only using sitelinks as a weak indication of quality seems correct to me.
Also the idea that some languages are more important than other, and some
large languages are more important than other. I would really like it if
the reasoning behind the classes and the features could be spelled out.

I have serious issues with the ORES training sets, but that is another
discussion. ;/ (There is a lot of similar bot edits in the sets, and that
will train a bot-detector, which is not what we need! Grumpf…)

On Wed, Mar 22, 2017 at 3:33 PM, Aaron Halfaker 
wrote:

> Hey wiki-research-l folks,
>
> Gerard didn't actually link you to the quality criteria he takes issue
> with.  See https://www.wikidata.org/wiki/Wikidata:Item_quality  I think
> Gerard's argument basically boils down to Wikidata != Wikipedia, but it's
> unclear how that is relevant to the goal of measuring the quality of
> items.  This is something I've been talking to Lydia about for a long
> time.  It's been great for the few Wikis where we have models deployed in
> ORES[1] (English, French, and Russian Wikipedia).  So we'd like to have the
> same for Wikidata.   As Lydia said, we do all sorts of fascinating things
> with a model like this.  Honestly, I think the criteria is coming together
> quite nicely and we're just starting a pilot labeling campaign to work
> through a set of issues before starting the primary labeling drive.
>
> 1. https://ores.wikimedia.org
>
> -Aaron
>
>
>
> On Wed, Mar 22, 2017 at 6:39 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com> wrote:
>
>> Hoi,
>> What I have read is that it will be individual items that are graded. That
>> is not what helps you determine what items are lacking in something. When
>> you want to determine if something is lacking you need a relational
>> approach. When you approach a award like this one [1], it was added to
>> make
>> the award for a person [2] more complete. No real importance is given to
>> this award, just a few more people were added because they are part of a
>> group that gets more attention from me [3]. For yet another award [4], I
>> added all the people who received the award because I was told by
>> someone's
>> expert opinion that they were all notable (in the Wikipedia sense of the
>> word). I added several of these people in Wikidata. Arguably, the Wikidata
>> the quality for the item for the award is great but it has no article
>> associated to it in Wikipedia but that has nothing to do with the quality
>> of the information it provides. It is easy and obvious to recognise in one
>> level deeper that quality issues arise; the info for several people is
>> meagre at best.You cannot deny their relevance though; removing them
>> destroys the quality for the award.
>>
>> The point is that in relations you can describe quality, in the grading
>> that is proposed there is nothing really that is actionable.
>>
>> When you add links to the mix, these same links have no bearing on the
>> quality of the Wikidata item. Why would it? Links only become interesting
>> when you compare the statements in Wikidata with the links to other
>> articles in the same Wikipedia. This is not what this approach brings.
>>
>> Really, how will the grades to items make a difference. How will it help
>> us
>> understand that "items relating to railroads are lacking"? It does not.
>>
>> When you want to have indicators for quality; here is one.. an author (and
>> its subclasses) should have a VIAF identifier. An artist with objects in
>> the Getty Museum should have an ULAN number. The lack of such information
>> is actionable. The number of interwiki links is not, the number of
>> statements are not and even references are not that convincing.
>> Thanks,
>>   GerardM
>>
>> [1] https://tools.wmflabs.org/reasonator/?=29000734
>> [2] https://tools.wmflabs.org/reasonator/?=7315382
>> [3] https://tools.wmflabs.org/reasonator/?=3308284
>> [4] https://tools.wmflabs.org/reasonator/?=28934266
>>
>> On 22 March 2017 at 11:56, Lydia Pintscher 
>> wrote:
>>
>> > On Wed, Mar 22, 2017 at 10:03 AM, Gerard Meijssen
>> >  wrote:
>> > > In your reply I find little argument why this approach is useful. I do
>> > not
>> > > find a result that is actionable. There is little point to this
>> approach
>> > > and it does not fit with well with much of the Wikidata practice.
>> >
>> > Gerard, the outcome will be very actionable. We will have the
>> > groundwork needed to identify individual items and sets of items that
>> > need improvement. If it for example turns out that our items related
>> > to railroads are particularly lacking then that is something we can
>> > concentrate on if we so chose. We can do editathons, data
>> > partnerships, quality drives and and and.
>> >
>> >
>> > Cheers
>> > Lydia
>> >
>> > --
>> > Lydia Pintscher - http://about.me/lydia.pintscher
>> > Product Manager for Wikidata
>> >
>> > Wikimedia Deutschland e.V.
>> 

Re: [Wikidata] The basis for Wikidata quality

2017-03-22 Thread John Erling Blad
Forgot to mention; this is not really about _quality_, Gerrard says model
of quality, it is about _trust_ and _reputation_. Something can have low
quality and high trust, ref cheap cell phones, and the reputation might not
reflect the actual quality.

You (usually) measure reputation and calculate trust, but I have seen it
the other way around. The end result is the same anyhow.

On Wed, Mar 22, 2017 at 3:31 PM, John Erling Blad <jeb...@gmail.com> wrote:

> Sitelinks to an item are an approximation of the number of views of the
> data from an item, and as such gives an approximation to the likelihood of
> detecting an error. Few views imply a larger time span before an error is
> detected. It is really about estimating quality as a function of the age of
> the item as number of page views, but approximated through sitelinks.
>
> Problem is, the number of sitelinks is not a good approximation. Yes it is
> a simple approximation, but it is still pretty bad.
>
> References are an other way to verify the data, but that is not a valid
> argument against measuring the age of the data.
>
> I've been toying with an idea for some time that use statistical inference
> to try to identify questionable facts, but it will probably not be done -
> it is way to much work to do in spare time.
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] The basis for Wikidata quality

2017-03-22 Thread John Erling Blad
Sitelinks to an item are an approximation of the number of views of the
data from an item, and as such gives an approximation to the likelihood of
detecting an error. Few views imply a larger time span before an error is
detected. It is really about estimating quality as a function of the age of
the item as number of page views, but approximated through sitelinks.

Problem is, the number of sitelinks is not a good approximation. Yes it is
a simple approximation, but it is still pretty bad.

References are an other way to verify the data, but that is not a valid
argument against measuring the age of the data.

I've been toying with an idea for some time that use statistical inference
to try to identify questionable facts, but it will probably not be done -
it is way to much work to do in spare time.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Upload mathematical formulas in Wikidata

2016-09-21 Thread John Erling Blad
I was looking at the math formulas yesterday, and it seems to me that if
this should be useful then it must include the proof. That is a problem as
a proof is a sequence of steps. How should we do that? And how do we
describe the transition from one step to the next? The description of the
transitions are in between the steps, and they are multilingual.

On Wed, Sep 21, 2016 at 5:53 PM, kaushal dudhat 
wrote:

> Hello,
>
> My name is Kaushal and currently I and a friend are working on AskPlatypus
> as part of our master thesis. We want to add a module to AskPlatypus which
> answers mathematical questions with the use of Wikidata. As a first step we
> want to add more mathematical formulas to Wikidata. We extracted a lot of
> them from Wikipedia. There are 17838 formulas now. It would be great to get
> them uploaded into primary source tool.
> The list of formulas in primary source tool syntax is attached here.
>
> Please have look. It would be great if someone could upload them into the
> primary sources tool.
>
>
> Greetings
> Kaushal
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Dynamic Lists, Was: Re: List generation input

2016-09-19 Thread John Erling Blad
Either the list should just be a an entry point for a list structure or
table that is completely created outside the editors realm, or it should be
possible to merge any user edit with content from the bot. There should be
no in-between where the user needs additional knowledge about how to edit
the bot-produced content or even that (s)he can't edit the bot-produced
content. From the user (editors) point of view there should be no special
precautions to how some pages should be edited.

At the moment there are two pages in the main space at nowiki using
listeria bot; Bluepoint Games [1] and Thatgamecompany [2]. It is a (weak?)
consensus on not using the bot, so if a discussion is started they will
probably be removed. The main argument why it should not be used is because
it overwrites edits made by other users.

[1] https://no.wikipedia.org/wiki/Bluepoint_Games
[2] https://no.wikipedia.org/wiki/Thatgamecompany
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] (Ab)use of "deprecated"

2016-08-12 Thread John Erling Blad
A last note; listen to Markus, he is usually right.
Darn! 

On Fri, Aug 12, 2016 at 12:02 PM, John Erling Blad <jeb...@gmail.com> wrote:

> Latest date for population isn't necessarily the preferred one, it can be
> a predicted one for a short timespan. For example Statistics Norway provide
> a 3 month expectation in addition to the one year stats. The one year stats
> should be the preferred ones, the 3 month stats are kind of expected change
> on last years stats.
>
> Main problem with the 3 month stats are that they usually can't be used
> together with one-year stats, ie. they can't be normalized against the same
> base. Absolute value would seem the same, but growt rate against a one-year
> base would be wrong. It is a quite usual to do that error.
>
> A lot of stats "sounds similar" but isn't similar. It is a bit awkward.
> Sometimes stats refer to international standards for how they should be
> made, in those cases they can be compared. It is often described on a page
> for metadata about the stats. An example is population in rural areas,
> which many assume is the same in all countries. It is not.
>
> And while I'm on it; stats often describe a (possibly temporal) connection
> or relation between two or more (types of) subjects, and it is not
> something you should assign to one of the subject. If one part is a
> concrete instance then it makes sense to add stats about the other types to
> that item, like population for a municipality, but otherwise it could be
> wrong.
>
> In general, setting the last added or most recent value to preferred is in
> general wrong.
>
> And also, that something is not-preferred does not imply that it is
> deprecated. And also note the difference between deprecated and deferred.
>
> On Thu, Aug 11, 2016 at 10:56 PM, Stas Malyshev <smalys...@wikimedia.org>
> wrote:
>
>> Hi!
>>
>> > I would argue that this is better done by using qualifiers (e.g. start
>> > data, end data).  If a statement on the population size would be set to
>> > preferred, but isn't monitored for quite some time, it can be difficult
>> > to see if the "preferred" statement is still accurate, whereas a
>> > qualifier would give a better indication that that stament might need an
>> > update.
>>
>> Right now this bot:
>> https://www.wikidata.org/wiki/User:PreferentialBot
>> watches statements like "population" that have multiple values with
>> different time qualifiers but no current preference.
>>
>> What it doesn't currently do is to verify that the preferred one refers
>> to the latest date. It probably shouldn't fix these cases (because there
>> may be valid cause why the latest is not the best, e.g. some population
>> estimates are more precise than others) but it can alert about it. This
>> can be added if needed.
>>
>> --
>> Stas Malyshev
>> smalys...@wikimedia.org
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] (Ab)use of "deprecated"

2016-08-12 Thread John Erling Blad
Latest date for population isn't necessarily the preferred one, it can be a
predicted one for a short timespan. For example Statistics Norway provide a
3 month expectation in addition to the one year stats. The one year stats
should be the preferred ones, the 3 month stats are kind of expected change
on last years stats.

Main problem with the 3 month stats are that they usually can't be used
together with one-year stats, ie. they can't be normalized against the same
base. Absolute value would seem the same, but growt rate against a one-year
base would be wrong. It is a quite usual to do that error.

A lot of stats "sounds similar" but isn't similar. It is a bit awkward.
Sometimes stats refer to international standards for how they should be
made, in those cases they can be compared. It is often described on a page
for metadata about the stats. An example is population in rural areas,
which many assume is the same in all countries. It is not.

And while I'm on it; stats often describe a (possibly temporal) connection
or relation between two or more (types of) subjects, and it is not
something you should assign to one of the subject. If one part is a
concrete instance then it makes sense to add stats about the other types to
that item, like population for a municipality, but otherwise it could be
wrong.

In general, setting the last added or most recent value to preferred is in
general wrong.

And also, that something is not-preferred does not imply that it is
deprecated. And also note the difference between deprecated and deferred.

On Thu, Aug 11, 2016 at 10:56 PM, Stas Malyshev 
wrote:

> Hi!
>
> > I would argue that this is better done by using qualifiers (e.g. start
> > data, end data).  If a statement on the population size would be set to
> > preferred, but isn't monitored for quite some time, it can be difficult
> > to see if the "preferred" statement is still accurate, whereas a
> > qualifier would give a better indication that that stament might need an
> > update.
>
> Right now this bot:
> https://www.wikidata.org/wiki/User:PreferentialBot
> watches statements like "population" that have multiple values with
> different time qualifiers but no current preference.
>
> What it doesn't currently do is to verify that the preferred one refers
> to the latest date. It probably shouldn't fix these cases (because there
> may be valid cause why the latest is not the best, e.g. some population
> estimates are more precise than others) but it can alert about it. This
> can be added if needed.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Grammatical display of units

2016-07-30 Thread John Erling Blad
Norwegian have a lot of colloquialisms that must be handled if you want the
language to sound natural. The example with "kilo" exists in a lot of
languages in one form or another.
Then you have congruence on external factors (direction, length,
emptyness), missing plurals for some units (Norwegian mil is one example), …

On Sat, Jul 30, 2016 at 5:58 AM, Jan Macura <macura...@gmail.com> wrote:

> Hi John, all
>
> 2016-07-29 15:54 GMT+02:00 John Erling Blad <jeb...@gmail.com>:
>
>> In general this has more implications than simple singular/plural forms
>> of units. Agreement/concord/congruence is the proper term. In some
>> language you will even change the form given the distance to the thing you
>> are measuring or counting, even depending on the type of thing you are
>> measuring or counting, or change on the gender of the thing, and then even
>> only for some numbers.
>>
>
> Linguistic agreement is common in a lot of inflected languages [1].
>
> Now assume "kilogram" is changed to the short form "kilo", then it is "én
>> kilo" which is masculinum. The prefix "kilo" is only used for "kilogram",
>> so it isn't valid Norwegian til say "én kilo" when referring to "1 km", or
>> "én milli" when refering to "1 milligram".
>
>
> On the other hand, we don't have to deal with colloquialisms like "kilo"
> in your example. Modelling the formal language would be still hard enough.
>
> Best,
>  Jan
>
> [1] https://en.wikipedia.org/wiki/Fusional_language
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Grammatical display of units

2016-07-29 Thread John Erling Blad
In general this has more implications than simple singular/plural forms of
units. Agreement/concord/congruence is the proper term. [1] In some
language you will even change the form given the distance to the thing you
are measuring or counting, even depending on the type of thing you are
measuring or counting, or change on the gender of the thing, and then even
only for some numbers.

Assume you have "1 meter", then you could write it out as "én meter" in
Norwegian as "meter" is masculinum. Now assume you have "1 kilogram", then
you would write it out as "ett kilogram" as "gram" is neutrum. Now assume
"kilogram" is changed to the short form "kilo", then it is "én kilo" which
is masculinum. The prefix "kilo" is only used for "kilogram", so it isn't
valid Norwegian til say "én kilo" when referring to "1 km", or "én milli"
when refering to "1 milligram".

[1] https://en.wikipedia.org/wiki/Agreement_(linguistics)

On Fri, Jul 29, 2016 at 7:26 AM, Markus Kroetzsch <
markus.kroetz...@tu-dresden.de> wrote:

> On 28.07.2016 20:41, Stas Malyshev wrote:
>
>> Hi!
>>
>> Good point. Could we not just have a monolingual text string property
>>> that gives the preferred writing of the unit when used after a number? I
>>> don't think the plural/singular issue is very problematic, since you
>>> would have plural almost everywhere, even for "1.0 metres". So maybe we
>>>
>>
>> We have code to deal with that - note that "1 reference" and "2
>> references" are displayed properly. It's a matter of applying that code
>> and having it provided with proper configs.
>>
>
> You mean the MediaWiki message processing code? This would probably be
> powerful enough for units as well, but it works based on message strings
> that look a bit like MW template calls. Someone has to enter such strings
> for all units (and languages). This would be doable but the added power
> comes at the price of more difficult editing of such message strings
> instead of plain labels.
>
> As far as I know, the message parsing is available through the MW API, so
> external consumers could take advantage of the same system if the message
> strings were part of the data (we would like to have grammatical units in
> SQID as well).
>
>
>> just need one alternative label for most languages? Or are there
>>> languages with more complex grammar rules for units?
>>>
>>
>> Oh yes :) Russian is one, but I'm sure there are others.
>>
>>
> Forgive my ignorance; I was not able to read the example you gave there.
>
> Markus
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] ArticlePlaceholder rolled out to next wikis

2016-06-10 Thread John Erling Blad
The page is missing a link back to the item, it is now a dead end unless
you want to create an article.
I guess that isn't quite obvious…

On Fri, Jun 10, 2016 at 2:23 PM, Magnus Manske 
wrote:

> -- Forwarded message -
> From: Lydia Pintscher 
> * Gujarati (
> https://gu.wikipedia.org/wiki/%E0%AA%B5%E0%AA%BF%E0%AA%B6%E0%AB%87%E0%AA%B7:AboutTopic/Q13520818
> )
>
> Honoured to be a test item, even if I have never heard about that language
> before... :-)
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] editing from Wikipedia and co

2016-06-04 Thread John Erling Blad
An aditional note.

The problem is that a community can handle a quite specific workload. Some
of that goes into producing new articles, some goes into patrolling. Some
goes into maintenance of existing articles. When a project has to much
dynamic content (and it will always have some dynamic content) they start
to move into a maintenance mode, because they are swamped by the dynamic
content.

A typical indication that something is going on is that the patrol log
starts to overflow. Another is that the production of new articles starts
to drop, but that will drop anyhow because of addition of new content to
old articles.[1] To get good numbers we need the factor "new
content"/"edited old content". When that number start to drop then we know
that the community starts to run into problems.

If we had unlimited sources, then we could add more workload, but we don't
have unlimited sources (aka manhours). The community is limited. Adding new
work to the existing will thus not scale very well, if at all. We need ways
to cope with the existing workload, not additional work.

In short; nice thesis, but even if it can be _implemented_ it will not
scale on Wikipedia.

And of course, someone will surly claim that we could just just get some
more members in the community. Yes, sure, some of us has been working on
that for several years.[2]

[1]
https://commons.wikimedia.org/wiki/File:Stats-nowiki-2016-05-07-new-articles.png
[2]
https://commons.wikimedia.org/wiki/File:Stats-nowiki-2016-05-07-new-users.png

On Sat, Jun 4, 2016 at 11:55 AM, John Erling Blad <jeb...@gmail.com> wrote:

> Given Lydias post I wonder if it is to be expected that editors on
> Wikipedia shall manually import statements from Wikidata, as this is what
> can be read out of this thesis. This will create a huge backlog of work on
> all Wikipedias, and I can't see how we possibly can do this. For the moment
> we have a huge backlog on sources on nowiki, and adding a lot of additional
> manual work will not go very well with the community.
>
> What is the plan, can Lydia or some else please clarify?
>
> On Mon, May 30, 2016 at 11:43 PM, John Erling Blad <jeb...@gmail.com>
> wrote:
>
>> Page 21, moving to manual import of statements. I would really like to
>> see the analysis written out that ends in this conclusion. It is very
>> tempting, but the idea don't scale.
>>
>> We have now about 5-10 000 articles per active user. Those users have a
>> huge backlog of missing references. If they shall manage statements in
>> addition to their current backlog, then they will simply be overwhelmed.
>>
>>
>> On Mon, May 30, 2016 at 6:05 PM, Lydia Pintscher <
>> lydia.pintsc...@wikimedia.de> wrote:
>>
>>> Hey folks :)
>>>
>>> Charlie has been working on concepts for making it possible to edit
>>> Wikidata from Wikipedia and other wikis. This was her bachelor thesis. She
>>> has now published it:
>>> https://commons.wikimedia.org/wiki/File:Facilitating_the_use_of_Wikidata_in_Wikimedia_projects_with_a_user-centered_design_approach.pdf
>>> I am very happy she put a lot of thought and work into figuring out all
>>> the complexities of the topic and how to make this understandable for
>>> editors. We still have more work to do on the concepts and then actually
>>> have to implement it. Comments welcome.
>>>
>>>
>>> Cheers
>>> Lydia
>>> --
>>> Lydia Pintscher - http://about.me/lydia.pintscher
>>> Product Manager for Wikidata
>>>
>>> Wikimedia Deutschland e.V.
>>> Tempelhofer Ufer 23-24
>>> 10963 Berlin
>>> www.wikimedia.de
>>>
>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>>
>>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] editing from Wikipedia and co

2016-06-04 Thread John Erling Blad
Given Lydias post I wonder if it is to be expected that editors on
Wikipedia shall manually import statements from Wikidata, as this is what
can be read out of this thesis. This will create a huge backlog of work on
all Wikipedias, and I can't see how we possibly can do this. For the moment
we have a huge backlog on sources on nowiki, and adding a lot of additional
manual work will not go very well with the community.

What is the plan, can Lydia or some else please clarify?

On Mon, May 30, 2016 at 11:43 PM, John Erling Blad <jeb...@gmail.com> wrote:

> Page 21, moving to manual import of statements. I would really like to see
> the analysis written out that ends in this conclusion. It is very tempting,
> but the idea don't scale.
>
> We have now about 5-10 000 articles per active user. Those users have a
> huge backlog of missing references. If they shall manage statements in
> addition to their current backlog, then they will simply be overwhelmed.
>
>
> On Mon, May 30, 2016 at 6:05 PM, Lydia Pintscher <
> lydia.pintsc...@wikimedia.de> wrote:
>
>> Hey folks :)
>>
>> Charlie has been working on concepts for making it possible to edit
>> Wikidata from Wikipedia and other wikis. This was her bachelor thesis. She
>> has now published it:
>> https://commons.wikimedia.org/wiki/File:Facilitating_the_use_of_Wikidata_in_Wikimedia_projects_with_a_user-centered_design_approach.pdf
>> I am very happy she put a lot of thought and work into figuring out all
>> the complexities of the topic and how to make this understandable for
>> editors. We still have more work to do on the concepts and then actually
>> have to implement it. Comments welcome.
>>
>>
>> Cheers
>> Lydia
>> --
>> Lydia Pintscher - http://about.me/lydia.pintscher
>> Product Manager for Wikidata
>>
>> Wikimedia Deutschland e.V.
>> Tempelhofer Ufer 23-24
>> 10963 Berlin
>> www.wikimedia.de
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>>
>> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
>> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
>> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Ontology

2016-05-16 Thread John Erling Blad
There was a previous statement about an entity which is now deprecated. You
may as well add a source stating why it is deprecated.

On Sat, May 14, 2016 at 7:55 PM, Smolenski Nikola  wrote:

> Citiranje Gerard Meijssen :
> > I have stopped expecting necessary changes from Wikidata. It has been
> made
> > clear that dates will be associated with labels. By the way we can and do
> > already indicate the validity of facts on time.
>
> I can't see why would dates be associated with labels. Can someone explain?
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] StatBank at Statistics Norway will be accessible through new open API

2016-04-23 Thread John Erling Blad
At Statistics Norway (SSB) there is a service called "StatBank Norway"
("Statistikkbanken").[1][2] For some time it has been possible to access
this through an open API, serving JSON-stat.[4] Now they open up all the
remaining access and all 5000 tables will be made available.[3]

SSB use NLOD,[5][6] an open license on their published data. (I asked them
and all they really want is the source to be clearly given so to avoid
falsified data.)

[1] https://www.ssb.no/en/statistikkbanken
[2]
https://www.ssb.no/en/informasjon/om-statistikkbanken/how-to-use-statbank-norway
[3]
http://www.ssb.no/omssb/om-oss/nyheter-om-ssb/ssb-gjor-hele-statistikkbanken-tilgjengelig-som-apne-data
(Norwegian)
[4] https://json-stat.org/
[5] http://www.ssb.no/en/informasjon/copyright
[6] http://data.norge.no/nlod/en/1.0
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Bachelor's thesis on ArticlePlaceholder

2016-04-03 Thread John Erling Blad
> Ordering of statement groups

The solution described (?) seems to me as "dev-ish" way to do this, and I
think it is wrong. The grouping is something that should be done
dynamically, as it depends both on the item itself (ie the knowledge base),
its class hierarchy (ie interpretation of the knowledge base, often part of
the knowledge base), our communicative goal (the overall context of the
communication), the discourse (usually we drop this, as we don't maintain
state), and the user model (which changes through a wp-article). This
4-tuple is pretty well-known in Natural Language Generation, but the
implications for reuse of Wikidata statements in Wikipedia is mostly
neglected. (That is not something Lucie should discuss in a bachelor
thesis, but it is extremely important if the goal for Wikidata is actual
reuse on Wikipedia)

That said; I tried to figure out whats the idea, and also read the RfC
(Statement group ordering [1]), but actually I don't know whats planned
here. I think I know it, but most probably I don't. The statement group
ordering is a on-wiki list of ordered groups? How do you create those
groups? What is the implications on those groups? Does it has implications
for other visualizations? What if groups should follow the type of the
item? It seems like this describe a system where "one size fits all - or
make it youself".

And not to forget, where is the discussion? An RfC with no discussion?

[1]
https://www.mediawiki.org/wiki/Requests_for_comment/Statement_group_ordering

On Sun, Apr 3, 2016 at 5:06 PM, John Erling Blad <jeb...@gmail.com> wrote:

>
> > Red links are used frequently in Wikipedia to indicate an article which
> is does
> > not yet exist, but should. Today it leads the user to an empty create
> article page.
> > In the future it should instead bring them to an ArticlePlaceholder,
> offering the
> > option of creating an article. This is part of the topic of smart red
> links, which is
> > discussed in the section 8.1: Smart red links
>
> It should be interesting to hear if someone have an idea how this might
> work. There are some attempts on this at nowiki, none of them seems to work
> in all cases.
>
> Note that "Extension:ArticlePlaceholder/Smart red links" doesn't really
> solve the problem for existing redlinks, it solves the association problem
> when the user tries to resolve the redlink. That is one step further down
> the line, or more like solving the redlinks for a disambiguation page. ("I
> know there is a page like this, named like so, on that specific project.")
>
> Note also that an item is not necessarily described on any project, and
> that creating an item on Wikidata can be outside the editors scope or even
> very difficult. Often we have a name of some "thing", but we only have a
> sketchy idea about the thing itself. Check out
> https://www.wikidata.org/wiki/Q12011301 for an example.
>
> It seems like a lot of what are done so far on redlinks is an attempt to
> make pink-ish links with _some_information_, while the problem is that
> redlinks have _no_information_. The core reason why we have redlinks is
> that we lacks manpower to avoid them, and because of that we can't just add
> "some information". It is not a problem of what we need first, hens or
> eggs, as we have none of them.
>
> On Sun, Apr 3, 2016 at 4:27 PM, John Erling Blad <jeb...@gmail.com> wrote:
>
>> Just read through the doc, and found some important points. I post each
>> one in a separate mail.
>>
>> > Since it is hard to decide which content is actually notable, the items
>> appear-
>> > ing in the search should be limited to the ones having at least one
>> statements
>> > and two sitelinks to the same project (like Wikipedia or Wikivoyage).
>>
>> This is a good baseline, but figuring out what is notable locally is a
>> bit more involved. A language is used in a local area, and within that area
>> some items are more important just because they reside within the area.
>> This is quite noticeable in the differences between nnwiki and nowiki which
>> both basically covers "Norway". Also items that somehow relates to the
>> local area or language is more noticeable than those outside those areas.
>> By traversing upwords in the claims using the "part of" property it is
>> possible to build a priority on the area involved. It is possible to
>> traverse "nationality" and a few other properties.
>>
>> Things directly noticeable like an area enclosed in an area using the
>> language is somewhat easy to identify, but things that are noticeable by
>> association with another noticeable thing is not. Like a Danish slave ship
>&

Re: [Wikidata] Bachelor's thesis on ArticlePlaceholder

2016-04-03 Thread John Erling Blad
> Red links are used frequently in Wikipedia to indicate an article which
is does
> not yet exist, but should. Today it leads the user to an empty create
article page.
> In the future it should instead bring them to an ArticlePlaceholder,
offering the
> option of creating an article. This is part of the topic of smart red
links, which is
> discussed in the section 8.1: Smart red links

It should be interesting to hear if someone have an idea how this might
work. There are some attempts on this at nowiki, none of them seems to work
in all cases.

Note that "Extension:ArticlePlaceholder/Smart red links" doesn't really
solve the problem for existing redlinks, it solves the association problem
when the user tries to resolve the redlink. That is one step further down
the line, or more like solving the redlinks for a disambiguation page. ("I
know there is a page like this, named like so, on that specific project.")

Note also that an item is not necessarily described on any project, and
that creating an item on Wikidata can be outside the editors scope or even
very difficult. Often we have a name of some "thing", but we only have a
sketchy idea about the thing itself. Check out
https://www.wikidata.org/wiki/Q12011301 for an example.

It seems like a lot of what are done so far on redlinks is an attempt to
make pink-ish links with _some_information_, while the problem is that
redlinks have _no_information_. The core reason why we have redlinks is
that we lacks manpower to avoid them, and because of that we can't just add
"some information". It is not a problem of what we need first, hens or
eggs, as we have none of them.

On Sun, Apr 3, 2016 at 4:27 PM, John Erling Blad <jeb...@gmail.com> wrote:

> Just read through the doc, and found some important points. I post each
> one in a separate mail.
>
> > Since it is hard to decide which content is actually notable, the items
> appear-
> > ing in the search should be limited to the ones having at least one
> statements
> > and two sitelinks to the same project (like Wikipedia or Wikivoyage).
>
> This is a good baseline, but figuring out what is notable locally is a bit
> more involved. A language is used in a local area, and within that area
> some items are more important just because they reside within the area.
> This is quite noticeable in the differences between nnwiki and nowiki which
> both basically covers "Norway". Also items that somehow relates to the
> local area or language is more noticeable than those outside those areas.
> By traversing upwords in the claims using the "part of" property it is
> possible to build a priority on the area involved. It is possible to
> traverse "nationality" and a few other properties.
>
> Things directly noticeable like an area enclosed in an area using the
> language is somewhat easy to identify, but things that are noticeable by
> association with another noticeable thing is not. Like a Danish slave ship
> operated by a Norwegian firm, the ship is thus noticeable in nowiki. I
> would say that all things linked as an item from other noticeable things
> should be included. Some would perhaps say that "items with second order
> relevance should be included".
>
>
> On Sat, Apr 2, 2016 at 11:09 PM, Luis Villa <l...@lu.is> wrote:
>
>> On Sat, Apr 2, 2016, 4:34 AM Lucie Kaffee <lucie.kaf...@wikimedia.de>
>> wrote:
>>
>>> I wrote my Bachelor's thesis on "Generating Article Placeholders from
>>> Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The
>>> thesis summarizes a lot of the work done on the ArticlePlaceholder
>>> extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder
>>> )
>>>
>>> I uploaded the thesis to commons under a CC-BY-SA license- you can find
>>> it at
>>> https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_Access_to_Free_and_Open_Knowledge.pdf
>>>
>>> I continue working on the extension and aim to deploy it to the first
>>> Wikipedias, that are interested, in the next months.
>>>
>>> I am happy to answer questions related to the extension!
>>>
>>
>> Great work on something that I *believe *has a lot of promise - thanks!
>> I really think this approach has a lot of promise to help take back some
>> readership from Google, and potentially in the long-run drive more new
>> editors as well. (I know that was part of the theory of LSJbot, though I
>> don't know if anyone has actually a/b tested that.)
>>
>> I was somewhat surprised to not see data collection discussed in Section
>> 8.10 - a

Re: [Wikidata] Bachelor's thesis on ArticlePlaceholder

2016-04-03 Thread John Erling Blad
Just read through the doc, and found some important points. I post each one
in a separate mail.

> Since it is hard to decide which content is actually notable, the items
appear-
> ing in the search should be limited to the ones having at least one
statements
> and two sitelinks to the same project (like Wikipedia or Wikivoyage).

This is a good baseline, but figuring out what is notable locally is a bit
more involved. A language is used in a local area, and within that area
some items are more important just because they reside within the area.
This is quite noticeable in the differences between nnwiki and nowiki which
both basically covers "Norway". Also items that somehow relates to the
local area or language is more noticeable than those outside those areas.
By traversing upwords in the claims using the "part of" property it is
possible to build a priority on the area involved. It is possible to
traverse "nationality" and a few other properties.

Things directly noticeable like an area enclosed in an area using the
language is somewhat easy to identify, but things that are noticeable by
association with another noticeable thing is not. Like a Danish slave ship
operated by a Norwegian firm, the ship is thus noticeable in nowiki. I
would say that all things linked as an item from other noticeable things
should be included. Some would perhaps say that "items with second order
relevance should be included".


On Sat, Apr 2, 2016 at 11:09 PM, Luis Villa  wrote:

> On Sat, Apr 2, 2016, 4:34 AM Lucie Kaffee 
> wrote:
>
>> I wrote my Bachelor's thesis on "Generating Article Placeholders from
>> Wikidata for Wikipedia: Increasing Access to Free and Open Knowledge". The
>> thesis summarizes a lot of the work done on the ArticlePlaceholder
>> extension ( https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder )
>>
>> I uploaded the thesis to commons under a CC-BY-SA license- you can find
>> it at
>> https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_Access_to_Free_and_Open_Knowledge.pdf
>>
>> I continue working on the extension and aim to deploy it to the first
>> Wikipedias, that are interested, in the next months.
>>
>> I am happy to answer questions related to the extension!
>>
>
> Great work on something that I *believe *has a lot of promise - thanks! I
> really think this approach has a lot of promise to help take back some
> readership from Google, and potentially in the long-run drive more new
> editors as well. (I know that was part of the theory of LSJbot, though I
> don't know if anyone has actually a/b tested that.)
>
> I was somewhat surprised to not see data collection discussed in Section
> 8.10 - are there plans to do that? I would have expected to see a/b testing
> discussed as part of the deployment methodology, so that it could be
> compared both to the current baseline and also to similar approaches (like
> the ones you survey in Section 3).
>
> Thanks again for the hard work here-
>
> Luis
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] upcoming deployments/features

2016-02-03 Thread John Erling Blad
It is a bit strange to defines a data type in terms of a library of
functions in another language.
Or is just me that thinks this is a bit odd?

What about MathML?

On Wed, Feb 3, 2016 at 12:06 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> For a consumer, the main practical questions would be:
>
> (1) What subset of LaTeX exactly do you need to support to display the
> math expressions in Wikidata?
> (2) As a follow up: does MathJAX work to display this? If not, what does?
>
> Cheers,
>
> Markus
>
> On 02.02.2016 10:01, Moritz Schubotz wrote:
>
>> The string is interpreted by the math extension in the same way as the
>> Math extension interprets the text between the  tags.
>> There is an API to extract identifiers and the packages required to
>> render the input with regular latex from here:
>> http://api.formulasearchengine.com/v1/?doc
>> or also
>>
>> https://en.wikipedia.org/api/rest_v1/?doc#!/Math/post_media_math_check_type
>> (The wikipedia endpoint has been opened to the public just moments ago)
>> In the future, we are planning to provide additional semantics from there.
>> If you have additional questions, please contact me directly, since I'm
>> not a member on the list.
>> Moritz
>>
>> On Tue, Feb 2, 2016 at 8:53 AM, Lydia Pintscher
>> >
>> wrote:
>>
>> On Mon, Feb 1, 2016 at 8:44 PM Markus Krötzsch
>> > > wrote:
>>
>> On 01.02.2016 17:14, Lydia Pintscher wrote:
>>  > Hey folks :)
>>  >
>>  > I just sat down with Katie to plan the next important feature
>>  > deployments that are coming up this month. Here is the plan:
>>  > * new datatype for mathematical expressions: We'll get it live
>> on
>>  > test.wikidata.org 
>>  tomorrow and then bring it
>>  > to wikidata.org  
>> on the 9th
>>
>> Documentation? What will downstream users like us need to do to
>> support
>> this? How is this mapped to JSON? How is this mapped to RDF?
>>
>>
>> It is a string representing markup for the Math extension. You can
>> already test it here: http://wikidata.beta.wmflabs.org/wiki/Q117940.
>> See also https://en.wikipedia.org/wiki/Help:Displaying_a_formula.
>> Maybe Moritz wants to say  bit more as his students created the
>> datatype.
>>
>> Cheers
>> Lydia
>> --
>> Lydia Pintscher - http://about.me/lydia.pintscher
>> Product Manager for Wikidata
>>
>> Wikimedia Deutschland e.V.
>> Tempelhofer Ufer 23-24
>> 10963 Berlin
>> www.wikimedia.de 
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.
>> V.
>>
>> Eingetragen im Vereinsregister des Amtsgerichts
>> Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig
>> anerkannt durch das Finanzamt für Körperschaften I Berlin,
>> Steuernummer 27/029/42207.
>>
>>
>>
>>
>> --
>> Moritz Schubotz
>> TU Berlin, Fakultät IV
>> DIMA - Sekr. EN7
>> Raum EN742
>> Einsteinufer 17
>> D-10587 Berlin
>> Germany
>>
>> Tel.: +49 30 314 22784
>> Mobil.: +49 1578 047 1397
>> Fax:  +49 30 314 21601
>> E-Mail: schub...@tu-berlin.de 
>> Skype: Schubi87
>> ICQ: 200302764
>> Msn: mor...@schubotz.de
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Other sites

2016-02-03 Thread John Erling Blad
Is there any documentation on how Commons is handled? Especially how
additional links to gallery/category is handled if sitelinks are added to
the opposite page? That is there is a sitelink to the gallery, what happen
then with the category link. Do we reimplement the additional link with
javascript?

On Wed, Feb 3, 2016 at 2:47 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Wed, Feb 3, 2016 at 1:48 PM Emilio J. Rodríguez-Posada <
> emi...@gmail.com> wrote:
>
>> Hello;
>>
>> What sites are allowed in the item "other sites" section? I haven't found
>> documentation about it.
>>
>
> Currently MediaWiki, Wikispecies, Meta, Wikidata and Commons are in this
> section.
>
>
>> I would suggest to allow links to other wikis in the Internet, that way
>> we could create a network of wikis, although I know that some wikis are
>> controversial and don't follow the Wikipedia guidelines.
>>
>
> Links to sites outside Wikimedia are handled via statements.
>
>
> Cheers
> Lydia
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Duplicates in Wikidata

2015-12-27 Thread John Erling Blad
There are also a lot of errors/duplicates in WorldCat.

On Sun, Dec 27, 2015 at 12:43 PM, Gerard Meijssen  wrote:

> Hoi,
> Probably :)
> Thanks,
>  Gerard
>
> On 27 December 2015 at 12:31, Federico Leva (Nemo) 
> wrote:
>
>> Is this something for a Wikidata game? :)
>>
>> Nemo
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Units

2015-12-20 Thread John Erling Blad
Can someone give an explanation why development of units are so difficult,
or what seems to be the problem? Is there anything other people can do?

It seems to me like this has a serious feature creep...

https://phabricator.wikimedia.org/T77977
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Units

2015-12-20 Thread John Erling Blad
Sorry for long rant!

1. How do I get/set the list of base units (try "foot" and then ask
yourself "which foot is this?" [4])
2. How do I get/set derived units (Siemens is the inverse of Ohm, that is
S=Ω⁻¹ [3])
3. How do I add prefixed units (1kΩ, and 1mΩ, and note there is a bunch of
non-standard prefixes - not to forget localized ones! [5] I hate the mess
on Wikipedia... And note the mess with kilogram[7])
4. How to normalize a unit (it is (nearly) always µF, even when you write
4700µF [6] - this text is so messy and it does not really address the
problem)
5. Is there any plan to handle deprecated units (the weight prototype
gaining weight[1], and the new proposed standard [2] is one known problem
6. How to disambiguate units (the feet-problem in another version)
7. Is there any plan to add warnings about units that needs disambiguation
(the feet-problem is well-known, but how about kilogram? And note that is
the kilogram that is the standard unit, not the gram.)
8. How to handle incompatibilities between unit systems (you can't convert
some old units to newer ones.)

On 1, perhaps we could make Wd-entries for the different feets, but then
the lookup-list will be very long. Old classical units for length, area,
volume, and weight are the biggest problems. Some of them also coincide
with 8, as accurate conversions isn't possible.

On 2, some derived units can be transformed from one form into another.
Siemens is one of them. Others can be expressed in different ways, but all
variants is really just one and the same. We could use aliases for this, as
for example Farad (F) is s⁴A²⋅m⁻²kg⁻¹ and a bunch of other. An other
solution is to cluster descriptions, that makes a better solution for 1.

On 3, the simple solution is to add the SI-prefixes to everything. It
almost works, except we have units like kilogram (kg) which should retain
the "k". It will also create problems with kph and a bunch of other such
localized units.

On 4, don't confuse normalization of the unit with normalization of the
value. Normalization of the unit is highly domain specific.

On 5, note that there is a subtle difference here between an unit that goes
out of common use and a unit that is deprecated through law. Not sure if we
need to differentiate those, hope not!

On 6, I think foot is a good example on how long this list can be. Note
that in some countries different trade unions used different length of a
foot, and even some cities defined their own foot. I would like to define
my foot as the new standard unit.

On 7, note that the accuracy (error bounds) on the number should trigger a
need for disambiguation. Also note that precision imply a set level of
accuracy. Accuracy and precision is not the same, but precision can be used
as a proxy for accuracy.

On 8, there are several posts about this problem. Some claim you can avoid
the problem by setting the accuracy in the conversion sufficiently high. I
don't think that would be a valid solution. Perhaps we should have a
property for valid conversions, with constants for each one of them and
with proper error bounds. If a conversion isn't listed, then it isn't valid.

[1] http://www.livescience.com/26017-kilogram-gained-weight.html
[2]
http://www.dailymail.co.uk/sciencetech/article-3161130/Reinventing-kilogram-Official-unit-weight-measurement-new-accurate-definition-following-breakthrough.html
[3] https://en.wikipedia.org/wiki/Siemens_%28unit%29
[4] https://en.wikipedia.org/wiki/Foot_%28unit%29
[5] https://en.wikipedia.org/wiki/Decametre
[6] https://www.westfloridacomponents.com/blog/is-mf-mfd-the-same-as-uf/
[7] http://www.bipm.org/en/bipm/mass/ipk/

On Sun, Dec 20, 2015 at 11:57 AM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Sun, Dec 20, 2015 at 10:08 AM, John Erling Blad <jeb...@gmail.com>
> wrote:
> > Can someone give an explanation why development of units are so
> difficult,
> > or what seems to be the problem? Is there anything other people can do?
> >
> > It seems to me like this has a serious feature creep...
> >
> > https://phabricator.wikimedia.org/T77977
>
> We have done the minimum version and deployed it. You're able to enter
> and retrieve information with quantities and units. Now that the
> minimum is in place other things got higher priority. That was/is
> mainly data quality, properly linking to other sources in out export
> formats and a UI cleanup including separating out identifiers. Those
> are still in progress. Once we've brought those further along we'll
> pick up the remaining work for units as well.
> The main thing that is left now is unit conversion for the query service.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
>

Re: [Wikidata] Units

2015-12-20 Thread John Erling Blad
Ok, still not working properly.

On Sun, Dec 20, 2015 at 5:23 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Sun, Dec 20, 2015 at 5:08 PM, John Erling Blad <jeb...@gmail.com>
> wrote:
> > Sorry for long rant!
> >
> > 1. How do I get/set the list of base units (try "foot" and then ask
> yourself
> > "which foot is this?" [4])
> > 2. How do I get/set derived units (Siemens is the inverse of Ohm, that is
> > S=Ω⁻¹ [3])
> > 3. How do I add prefixed units (1kΩ, and 1mΩ, and note there is a bunch
> of
> > non-standard prefixes - not to forget localized ones! [5] I hate the
> mess on
> > Wikipedia... And note the mess with kilogram[7])
> > 4. How to normalize a unit (it is (nearly) always µF, even when you write
> > 4700µF [6] - this text is so messy and it does not really address the
> > problem)
> > 5. Is there any plan to handle deprecated units (the weight prototype
> > gaining weight[1], and the new proposed standard [2] is one known problem
> > 6. How to disambiguate units (the feet-problem in another version)
> > 7. Is there any plan to add warnings about units that needs
> disambiguation
> > (the feet-problem is well-known, but how about kilogram? And note that is
> > the kilogram that is the standard unit, not the gram.)
> > 8. How to handle incompatibilities between unit systems (you can't
> convert
> > some old units to newer ones.)
>
> That's why I said a minimal version is live. In due time we'll get to
> these but they're not more important than the other things I
> mentioned.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] enabling in other projects sidebar on all wikis in January

2015-12-20 Thread John Erling Blad
+1, and let all the templates to mimic such linkage die in flames!

2015-12-20 12:41 GMT+01:00 Lydia Pintscher :

> Oh and one thing I forgot: A big thank you to Tpt who did most of the
> development for this feature.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Photographers' Identities Catalog (& WikiData)

2015-12-15 Thread John Erling Blad
There are some pretty good methods for optimizing the match process, but I
have not seen any implementation for that against Wikidata items. Only
things I've seen are some opportunistic methods. Duck tests gone wrong, or
"Darn it was a platypus!"

On Mon, Dec 14, 2015 at 11:19 PM, André Costa 
wrote:

> I'm planning to bring a few of the datasets into mix'n'match (@Magnus this
> is the one I asked sbout on Twitter) in January but not all of them are
> suitable and I believe separating KulturNav into multiple datasets on
> mix'n'match maxes more sense and makes it more likely that they get matched.
>
> Some of the early adopters of KulturNav have been working with WMSE to
> facilitate bi-directional matching. This is done on a dataset-by-dataset
> level since different institutions are responsible for different datasets.
> My hope is that mix'n'match will help in this area as well, even as a tool
> for the institutions own staff who are often interested in matching entries
> to Wikipedia (which most of the time means wikidata).
>
> @John: There are processes for matching kulturnav identifiers to wikidata
> entities. Only afterwards are details imported. Mainly to source statements
> [1] and [2]. There is some (not so user friendly) stats at [3].
>
> Cheers,
> André
>
> [1]
> https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/L_PBot_2
> [2]
> https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/L_PBot_3
> [3] https://tools.wmflabs.org/lp-tools/misc/data/
> --
> André Costa
> GLAM developer
> Wikimedia Sverige
>
> Magnus Manske, 13/12/2015 11:24:
>
> >
> > Since no one mentioned it, there is a tool to do the matching to WD much
> > more efficiently:
> > https://tools.wmflabs.org/mix-n-match/
> 
>
> +1
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Photographers' Identities Catalog (& WikiData)

2015-12-09 Thread John Erling Blad
I think the Norwegian lists are a subset of Preus Photo Museums list. It is
now maintained partly by Nasjonalbiblioteket (the Norwegian one, not the
Swedish one) and Norsk Lokalhistorisk Institutt. For examle; Anders Beer
Wilse in nowiki,[1] at Lokalhistoriewiki,[2] and at Nasjonalbiblioteket.[3]

Kulturnav is a kind of maintained ontology, where most of the work is done
by local museums. The software for the site itself is made (in part) by a
grant from Norsk Kulturråd.

We should connect as much as possible of our resources to resources at
Kulturnav, and not just copy data. That said, we don't have a very good
model for hov to materialize data from external sites and make it available
for our client sites, so our option is more or less just to copy. It is
better to maintain data at one location.

[1] https://no.wikipedia.org/wiki/Anders_Beer_Wilse
[2] https://lokalhistoriewiki.no/index.php/Anders_Beer_Wilse
[3] http://www.nb.no/nmff/fotograf.php?fotograf_id=3050

On Wed, Dec 9, 2015 at 9:51 PM, André Costa 
wrote:

> Happy to be of use. There is also one for:
> * Swedish photo studios [1]
> * Norwegian photographers[2]
> * Norwegian photo studios [3]
> I'm less familiar with these though and don't have a timeline for wikidata
> integration.
>
> Cheers,
> André
>
> [1] http://kulturnav.org/deb494a0-5457-4e5f-ae9b-e1826e0de681
> [2] http://kulturnav.org/508197af-6e36-4e4f-927c-79f8f63654b2
> [3] http://kulturnav.org/7d2a01d1-724c-4ad2-a18c-e799880a0241
> --
> André Costa
> GLAM developer
> Wikimedia Sverige
> On 9 Dec 2015 15:07, "David Lowe"  wrote:
>
>> Thanks, André! I don't know that I've found that before. Great to get
>> country (or region) specific lists like this.
>> D
>>
>> On Wednesday, December 9, 2015, André Costa 
>> wrote:
>>
>>> In case you haven't come across it before
>>> http://kulturnav.org/1f368832-7649-4386-97b6-ae40cce8752b is the entry
>>> point to the Swedish database of (primarily early) photographers curated by
>>> the Nordic Museum in Stockholm.
>>>
>>> It's not that well integrated into Wikidata yet but the plan is to fix
>>> that during early 2016. That would also allow a variety of photographs on
>>> Wikimedia Commons to be linked to these entries.
>>>
>>> Cheers,
>>> André
>>>
>>> André Costa | GLAM developer, Wikimedia Sverige |
>>> andre.co...@wikimedia.se | +46 (0)733-964574
>>>
>>> Stöd fri kunskap, bli medlem i Wikimedia Sverige.
>>> Läs mer på blimedlem.wikimedia.se
>>>
>>> On 9 December 2015 at 02:44, David Lowe  wrote:
>>>
 Thanks, Tom.
 I'll have to look at this specific case when I'm back at work tomorrow,
 as it does seem you found something in error.
 As for my process: with WD, I queried out the label, description &
 country of citizenship, dob & dod of of everyone with occupation:
 photographer. After some cleaning, I can get the WD data formatted like my
 own (Name, Nationality, Dates). I can then do a simple match, where
 everything matches exactly. For the remainder, I then match names and
 dates- without Nationality, which is often very "soft" information. For
 those that pass a smell test (one is "English" the other is "British") I
 pass those along, too. For those with greater discrepancies, I look still
 closer. For those with still greater discrepancies, I manually,
 individually query my database for anyone with the same last name & same
 first initial to catch misspellings or different transliterations. I also
 occasionally put my entire database into open refine to catch instances
 where, for instance, a Chinese name has been given as FamilyName, GivenName
 in one source, and GivenName, FamilyName in another.
 In short, this is scrupulously- and manually- checked data. I'm not
 savvy enough to let an algorithm make my mistakes for me! But let me know
 if this seems to be more than bad luck of the draw- finding the conflicting
 data you found.
 I have also to say, I may suppress the Niepce Museum collection, as
 it's from a really crappy list of photographers in their collection which I
 found many years ago, and can no longer find. I don't want to blame them
 for the discrepancy, but that might be the source. I don't know.
 As I start to query out places of birth & death from WD in the next
 days, I expect to find more discrepancies. (Just today, I found dozens of
 folks whom ULAN gendered one way, and WD another- but were undeniably the
 same photographer. )
 Thanks,
 David


 On Tuesday, December 8, 2015, Tom Morris  wrote:

> Can you explain what "indexing" means in this context?  Is there some
> type of matching process?  How are duplicates resolved, if at all? Was the
> Wikidata info extracted from a dump or one of the APIs?
>
> When I looked at the first person I picked at 

Re: [Wikidata] [Wikimedia-l] Quality issues

2015-12-01 Thread John Erling Blad
I for one had some discussions with Denny about licensing, and even if it
hurt my feelings to say this (at least two of them) he was right. Facts
can't be copyrighted and because of that CC0 is the natural choice for data
in the database.

Still in Europe databases can be given a protection, and that can limit the
access to the site. By using the CC0 license on the whole thing reuse are
much easier.

Database protection and copyright is different issues and should not be
mixed.

John

On Wed, Dec 2, 2015 at 12:43 AM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> [I continue cross-posting for this reply, but it would make sense to
> return the thread to the Wikidata list where it started, so as to avoid
> partial discussions happening in many places.]
>
>
> Andreas,
>
> On 27.11.2015 12:08, Andreas Kolbe wrote:
>
>> Gerard,
>>
>
> (I should note that my reply has nothing to do with what Gerard said, or
> to the high-level "quality" debate in this thread.)
>
> [...]
>
> Wikipedia content is considered a reliable source in Wikidata, and
>> Wikidata content is used as a reliable source by Google, where it
>> appears without any indication of its provenance.
>>
>
> This prompted me to reply. I wanted to write an email that merely says:
>
> "Really? Where did you get this from?" (Google using Wikidata content)
>
> But then I read the rest ... so here you go ...
>
>
> Your email mixes up many things and effects, some of which are important
> issues (e.g., the fact that VIAF is not a primary data source that should
> be used in citations). Many other of your remarks I find very hard to take
> serious, including but not limited to the following:
>
> * A rather bizarre connection between licensing models and accountability
> (as if it would make content more credible if you are legally required to
> say that you found it on Wikipedia, or even give a list of user names and
> IPs who contributed)
> * Some stories that I think you really just made up for the sake of
> argument (Denny alone has picked the Wikidata license? Google displays
> Wikidata content? Bing is fuelled by Wikimedia?)
> * Some disjointed remarks about the history of capitalism
> * The assertion that content is worse just because the author who created
> it used a bot for editing
> * The idea that engineers want to build systems with bad data because they
> like the challenge of cleaning it up -- I mean: really! There is nothing
> one can even say to this.
> * The complaint that Wikimedia employs too much engineering expertise and
> too little content expertise (when, in reality, it is a key principle of
> Wikimedia to keep out of content, and communities regularly complain WMF
> would still meddle too much).
> * All those convincing arguments you make against open, anonymous editing
> because of it being easy to manipulate (I've heard this from Wikipedia
> critics ten years ago; wonder what became of them)
> * And, finally, the culminating conspiracy theory of total control over
> political opinion, destroying all plurality by allowing only one viewpoint
> (not exactly what I observe on the Web ...) -- and topping this by blaming
> it all on the choice of a particular Creative Commons license for Wikidata!
> Really, you can't make this up.
>
> Summing up: either this is an elaborate satire that tries to test how
> serious an answer you will get on a Wikimedia list, or you should
> *seriously* rethink what you wrote here, take back the things that are
> obviously bogus, and have a down-to-earth discussion about the topics you
> really care about (licenses and cyclic sourcing on Wikimedia projects, I
> guess; "capitalist companies controlling public media" should be discussed
> in another forum).
>
> Kind regards,
>
> Markus
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] REST API for Wikidata

2015-11-30 Thread John Erling Blad
If you are using the P/Q/whatever markers in the id, then you should not
differentiate on items and properties in the root.

The path /items/{item_id}/data/{property_label} should use the property id
and not the property label. The later is not stable.

On Mon, Nov 30, 2015 at 2:55 PM, Jeroen De Dauw 
wrote:

> Hey all,
>
> I've created a very rough REST API for Wikidata and am looking for your
> feedback.
>
> * About this API: http://queryr.wmflabs.org
> * Documentation: http://queryr.wmflabs.org/about/docs
> * API root: http://queryr.wmflabs.org/api
>
> At present this is purely a demo. The data it serves is stale and
> potentially incomplete, the endpoints and formats they use are very much
> liable to change, the server setup is not reliable and I'm not 100% sure
> I'll continue with this little project.
>
> The main thing I'm going for with this API compared to the existing one is
> greater ease of use for common use cases. Several factors make this a lot
> easier to do in a new API than in the existing one: no need the serve all
> use cases, no need to retain compatibility with existing users and no
> framework imposed restrictions. You can read more about the difference on
> the website.
>
> You are invited to comment on the concept and on the open questions
> mentioned on the website.
>
> Cheers
>
> --
> Jeroen De Dauw - http://www.bn2vs.com
> Software craftsmanship advocate
> ~=[,,_,,]:3
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] REST API for Wikidata

2015-11-30 Thread John Erling Blad
Seems like you filter siteids on language, I don't think this is a correct
behaviour.

On Mon, Nov 30, 2015 at 5:38 PM, John Erling Blad <jeb...@gmail.com> wrote:

> If you are using the P/Q/whatever markers in the id, then you should not
> differentiate on items and properties in the root.
>
> The path /items/{item_id}/data/{property_label} should use the property id
> and not the property label. The later is not stable.
>
> On Mon, Nov 30, 2015 at 2:55 PM, Jeroen De Dauw <jeroended...@gmail.com>
> wrote:
>
>> Hey all,
>>
>> I've created a very rough REST API for Wikidata and am looking for your
>> feedback.
>>
>> * About this API: http://queryr.wmflabs.org
>> * Documentation: http://queryr.wmflabs.org/about/docs
>> * API root: http://queryr.wmflabs.org/api
>>
>> At present this is purely a demo. The data it serves is stale and
>> potentially incomplete, the endpoints and formats they use are very much
>> liable to change, the server setup is not reliable and I'm not 100% sure
>> I'll continue with this little project.
>>
>> The main thing I'm going for with this API compared to the existing one
>> is greater ease of use for common use cases. Several factors make this a
>> lot easier to do in a new API than in the existing one: no need the serve
>> all use cases, no need to retain compatibility with existing users and no
>> framework imposed restrictions. You can read more about the difference on
>> the website.
>>
>> You are invited to comment on the concept and on the open questions
>> mentioned on the website.
>>
>> Cheers
>>
>> --
>> Jeroen De Dauw - http://www.bn2vs.com
>> Software craftsmanship advocate
>> ~=[,,_,,]:3
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use-notes in item descriptions

2015-11-05 Thread John Erling Blad
Descriptions is a clarification like the parenthesis form on Wikipedia, but
extended and formalized. Use notes should not be put into this field.

John

On Thu, Nov 5, 2015 at 6:19 PM, James Heald  wrote:

> The place where these hints are vital is in the tool-tips that come up
> when somebody is inputting the value of a property.
>
> It's a quick message to say "don't use that item, use this other item".
>
> A section on the talk page simply doesn't cover it.
>
> I suppose one could create a community property, as you suggest, but as
> you say the challenge would be then making sure the system software
> presented it when it was needed.  I suspect that things intended to be
> presented by the system software are better created as system properties.
>
>-- James,
>
>
>
>
> On 05/11/2015 16:21, Benjamin Good wrote:
>
>> A section in the talk page associated with the article in question would
>> seem to solve this (definitely real) problem? - assuming that a would-be
>> editor was aware of the talk page.
>> Alternatively, you could propose a generic property with a text field that
>> could be added to items on an as-needed basis without any change to the
>> current software.  Again though, the challenge would be getting the
>> information in front of the user/editor at the right point in time.
>>
>>
>> On Thu, Nov 5, 2015 at 2:16 AM, Jane Darnell  wrote:
>>
>> Yes I have noticed this need for use notes, but it is specific to
>>> properties, isn't it? I see it in things such as choosing what to put in
>>> the "genre" property of an artwork. It would be nice to have some sort of
>>> pop-up that you can fill with more than what you put in. For example I
>>> get
>>> easily confused when I address the relative (as in kinship) properties;
>>> "father of the subject" is clear, but what about cousin/nephew etc.? You
>>> need more explanation room than can be stuffed in the label field to fit
>>> in
>>> the drop down. I have thought about this, but don't see any easy solution
>>> besides what you have done.
>>>
>>> On Thu, Nov 5, 2015 at 10:51 AM, James Heald  wrote:
>>>
>>> I have been wondering about the practice of putting use-notes in item
 descriptions.

 For example, on Q6581097 (male)
https://www.wikidata.org/wiki/Q6581097
 the (English) description reads:
"human who is male (use with Property:P21 sex or gender). For
 groups of males use with subclass of (P279)."

 I have added some myself recently, working on items in the
 administrative
 structure of the UK -- for example on Q23112 (Cambridgeshire)
 https://www.wikidata.org/wiki/Q23112
 I have changed the description to now read
 "ceremonial county of England (use Q21272276 for administrative
 non-metropolitan county)"

 These "use-notes" are similar to the disambiguating hat-notes often
 found
 at the top of articles on en-wiki and others; and just as those
 hat-notes
 can be useful on wikis, so such use-notes can be very useful on
 Wikidata,
 for example in the context of a search, or a drop-down menu.

 But...

 Given that the label field is also there to be presentable to end-users
 in contexts outside Wikidata, (eg to augment searches on main wikis, or
 to
 feed into the semantic web, to end up being used in who-knows-what
 different ways), yet away from Wikidata a string like "Q21272276" will
 typically have no meaning. Indeed there may not even be any distinct
 thing
 corresponding to it.  (Q21272276 has no separate en-wiki article, for
 example).

 So I'm wondering whether these rather Wikidata-specific use notes do
 really belong in the general description field ?

 Is there a case for moving them to a new separate use-note field created
 for them?

 The software could be adjusted to include such a field in search results
 and drop-downs and the item summary, but they would be a separate
 data-entry field on the item page, and a separate triple for the SPARQL
 service, leaving the description field clean of Wikidata-specific
 meaning,
 better for third-party and downstream applications.

 Am I right to feel that the present situation of just chucking
 everything
 into the description field doesn't seem quite right, and we ought to
 take a
 step forward from it?

-- James.

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata


>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>>
>>
>>
>> ___
>> Wikidata mailing list
>> 

Re: [Wikidata] Importing Freebase (Was: next Wikidata office hour)

2015-09-29 Thread John Erling Blad
Yes! +1

On Mon, Sep 28, 2015 at 11:27 PM, Denny Vrandečić <vrande...@gmail.com>
wrote:

> Actually, my suggestion would be to switch on Primary Sources as a default
> tool for everyone. That should increase exposure and turnover, without
> compromising quality of data.
>
>
>
> On Mon, Sep 28, 2015 at 2:23 PM Denny Vrandečić <vrande...@google.com>
> wrote:
>
>> Hi Gerard,
>>
>> given the statistics you cite from
>>
>> https://tools.wmflabs.org/wikidata-primary-sources/status.html
>>
>> I see that 19.6k statements have been approved through the tool, and 5.1k
>> statements have been rejected - which means that about 1 in 5 statements is
>> deemed unsuitable by the users of primary sources.
>>
>> Given that there are 12.4M statements in the tool, this means that about
>> 2.5M statements will turn out to be unsuitable for inclusion in Wikidata
>> (if the current ratio holds). Are you suggesting to upload all of these
>> statements to Wikidata?
>>
>> Tpt already did upload pieces of the data which have sufficient quality
>> outside the primary sources tool, and more is planned. But for the data
>> where the suitability for Wikidata seems questionable, I would not know
>> what other approach to use. Do you have a suggestion?
>>
>> Once you have a suggestion and there is community consensus in doing it,
>> no one will stand in the way of implementing that suggestion.
>>
>> Cheers,
>> Denny
>>
>>
>> On Mon, Sep 28, 2015 at 1:19 PM John Erling Blad <jeb...@gmail.com>
>> wrote:
>>
>>> Another; make a kind of worklist on Wikidata that reflect the watchlist
>>> on the clients (Wikipedias) but then, we often have items on our watchlist
>>> that we don't know much about. (Digression: Somehow we should be able to
>>> sort out those things we know (the place we live, the persons we have meet)
>>> from those things we have done (edited, copy-pasted).)
>>>
>>> I been trying to get some interest in the past for worklists on
>>> Wikipedia, it isn't much interest to make them. It would speed up tedious
>>> tasks of finding the next page to edit after a given edit is completed. It
>>> is the same problem with imports from Freebase on Wikidata, locate the next
>>> item on Wikidata with the same queued statement from Freebase, but within
>>> some worklist that the user has some knowledge about.
>>>
>>> Imagine "municipalities within a county" or "municipalities that is also
>>> on the users watchlist", and combine that with available unhandled
>>> Freebase-statements.
>>>
>>> On Mon, Sep 28, 2015 at 10:09 PM, John Erling Blad <jeb...@gmail.com>
>>> wrote:
>>>
>>>> Could it be possible to create some kind of info (notification?) in a
>>>> wikipedia article that additional data is available in a queue ("freebase")
>>>> somewhere?
>>>>
>>>> If you have the article on your watch-list, then you will get a warning
>>>> that says "You lazy boy, get your ass over here and help us out!" Or
>>>> perhaps slightly rephrased.
>>>>
>>>> On Mon, Sep 28, 2015 at 4:52 PM, Markus Krötzsch <
>>>> mar...@semantic-mediawiki.org> wrote:
>>>>
>>>>> Hi Gerard, hi all,
>>>>>
>>>>> The key misunderstanding here is that the main issue with the Freebase
>>>>> import would be data quality. It is actually community support. The goal 
>>>>> of
>>>>> the current slow import process is for the Wikidata community to "adopt"
>>>>> the Freebase data. It's not about "storing" the data somewhere, but about
>>>>> finding a way to maintain it in the future.
>>>>>
>>>>> The import statistics show that Wikidata does not currently have
>>>>> enough community power for a quick import. This is regrettable, but not
>>>>> something that we can fix by dumping in more data that will then be
>>>>> orphaned.
>>>>>
>>>>> Freebase people: this is not a small amount of data for our young
>>>>> community. We really need your help to digest this huge amount of data! I
>>>>> am absolutely convinced from the emails I saw here that none of the former
>>>>> Freebase editors on this list would support low quality standards. They
>>>>> have fought hard to fix errors and avoid issues coming into their data

Re: [Wikidata] Importing Freebase (Was: next Wikidata office hour)

2015-09-28 Thread John Erling Blad
Could it be possible to create some kind of info (notification?) in a
wikipedia article that additional data is available in a queue ("freebase")
somewhere?

If you have the article on your watch-list, then you will get a warning
that says "You lazy boy, get your ass over here and help us out!" Or
perhaps slightly rephrased.

On Mon, Sep 28, 2015 at 4:52 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> Hi Gerard, hi all,
>
> The key misunderstanding here is that the main issue with the Freebase
> import would be data quality. It is actually community support. The goal of
> the current slow import process is for the Wikidata community to "adopt"
> the Freebase data. It's not about "storing" the data somewhere, but about
> finding a way to maintain it in the future.
>
> The import statistics show that Wikidata does not currently have enough
> community power for a quick import. This is regrettable, but not something
> that we can fix by dumping in more data that will then be orphaned.
>
> Freebase people: this is not a small amount of data for our young
> community. We really need your help to digest this huge amount of data! I
> am absolutely convinced from the emails I saw here that none of the former
> Freebase editors on this list would support low quality standards. They
> have fought hard to fix errors and avoid issues coming into their data for
> a long time.
>
> Nobody believes that either Freebase or Wikidata can ever be free of
> errors, and this is really not the point of this discussion at all [1]. The
> experienced community managers among us know that it is not about the
> amount of data you have. Data is cheap and easy to get, even free data with
> very high quality. But the value proposition of Wikidata is not that it can
> provide storage space for lot of data -- it is that we have a functioning
> community that can maintain it. For the Freebase data donation, we do not
> seem to have this community yet. We need to find a way to engage people to
> do this. Ideas are welcome.
>
> What I can see from the statistics, however, is that some users (and I
> cannot say if they are "Freebase users" or "Wikidata users" ;-) are putting
> a lot of effort into integrating the data already. This is great, and we
> should thank these people because they are the ones who are now working on
> what we are just talking about here. In addition, we should think about
> ways of engaging more community in this. Some ideas:
>
> (1) Find a way to clean and import some statements using bots. Maybe there
> are cases where Freebase already had a working import infrastructure that
> could be migrated to Wikidata? This would also solve the community support
> problem in one way. We just need to import the maintenance infrastructure
> together with the data.
>
> (2) Find a way to expose specific suggestions to more people. The Wikidata
> Games have attracted so many contributions. Could some of the Freebase data
> be solved in this way, with a dedicated UI?
>
> (3) Organise Freebase edit-a-thons where people come together to work
> through a bunch of suggested statements.
>
> (4) Form wiki projects that discuss a particular topic domain in Freebase
> and how it could be imported faster using (1)-(3) or any other idea.
>
> (5) Connect to existing Wiki projects to make them aware of valuable data
> they might take from Freebase.
>
> Freebase is a much better resource than many other data resources we are
> already using with similar approaches as (1)-(5) above, and yet it seems
> many people are waiting for Google alone to come up with a solution.
>
> Cheers,
>
> Markus
>
> [1] Gerard, if you think otherwise, please let us know which error rates
> you think are typical or acceptable for Freebase and Wikidata,
> respectively. Without giving actual numbers you just produce empty strawman
> arguments (for example: claiming that anyone would think that Wikidata is
> better quality than Freebase and then refuting this point, which nobody is
> trying to make). See https://en.wikipedia.org/wiki/Straw_man
>
>
> On 26.09.2015 18:31, Gerard Meijssen wrote:
>
>> Hoi,
>> When you analyse the statistics, it shows how bad the current state of
>> affairs is. Slightly over one in a thousanths of the content of the
>> primary sources tool has been included.
>>
>> Markus, Lydia and myself agree that the content of Freebase may be
>> improved. Where we differ is that the same can be said for Wikidata. It
>> is not much better and by including the data from Freebase we have a
>> much improved coverage of facts. The same can be said for the content of
>> DBpedia probably other sources as well.
>>
>> I seriously hate this procrastination and the denial of the efforts of
>> others. It is one type of discrimination that is utterly deplorable.
>>
>> We should concentrate on comparing Wikidata with other sources that are
>> maintained. We should do this repeatedly and concentrate on workflows
>> that seek the differences and provide 

Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-28 Thread John Erling Blad
Probability of detection (PoD) is central to fighting vandalism, and that
does not imply making the vandalism less visible.

Symmetric statements makes vandalism appear in more places, making it more
visible, and thereby increasing the chance for detection.

If you isolate the vandalism it will be less visible, but then it will be
more likely that no one will ever spot it.

And yes, PoD is a military thingy and as such is disliked by the
wikicommunities. Still sometimes it is wise to check out what is actually
working and why it is working.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-28 Thread John Erling Blad
Depending on bots to set up symmetric relations is one of the things I find
very weird in Wikidata. That creates a situation where a user do an edit,
and a bot later on overrides the users previous edit. It is exactly the
same race condition that we fought earlier with the iw-bots, but now
replicated in Wikidata - a system that was supposed to remove the problem.

/me dumb, me confused.. o_O



On Mon, Sep 28, 2015 at 5:12 PM, Daniel Kinzler  wrote:

> Am 28.09.2015 um 16:43 schrieb Thomas Douillard:
> > Daniel Wrote:
> >> (*) This follows the principle of "magic is bad, let people edit".
> Allowing
> >> inconsistencies means we can detect errors by finding such
> inconsistencies.
> >> Automatically enforcing consistency may lead to errors propagating out
> of view
> >> of the curation process. The QA process on wikis is centered around
> edits, so
> >> every change should be an edit. Using a bot to fill in missing
> "reverse" links
> >> follows this idea. The fact that you found an issue with the data
> because you
> >> saw a bot do an edit is an example of this principle working nicely.
> >
> > That might prove to become a worser nightmare than the magic one ...
> It's seems
> > like refusing any kind of automation because it might surprise people
> for the
> > sake of exhausting them to let them do a lot of manual work.
>
> I'm not arguing against "any" kind of automation. I'm arguing against
> "invisible" automation baked into the backend software. We(*) very much
> encourage "visible" automation under community control like bots and other
> (semi-)automatic import tools like WiDaR.
>
> -- daniel
>
>
> (*) I'm part of the wikidata developer team, not an active member of the
> community. I'm primarily speaking for myself here, from my personal
> experience
> as a wikipedia and common admin. I know from past discussions that "bots
> over
> magic" is considered Best Practice among the dev team, and I believe it's
> also
> the approach preferred by the Wikidata community, but I cannot speak for
> them.
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Importing Freebase (Was: next Wikidata office hour)

2015-09-28 Thread John Erling Blad
Another; make a kind of worklist on Wikidata that reflect the watchlist on
the clients (Wikipedias) but then, we often have items on our watchlist
that we don't know much about. (Digression: Somehow we should be able to
sort out those things we know (the place we live, the persons we have meet)
from those things we have done (edited, copy-pasted).)

I been trying to get some interest in the past for worklists on Wikipedia,
it isn't much interest to make them. It would speed up tedious tasks of
finding the next page to edit after a given edit is completed. It is the
same problem with imports from Freebase on Wikidata, locate the next item
on Wikidata with the same queued statement from Freebase, but within some
worklist that the user has some knowledge about.

Imagine "municipalities within a county" or "municipalities that is also on
the users watchlist", and combine that with available unhandled
Freebase-statements.

On Mon, Sep 28, 2015 at 10:09 PM, John Erling Blad <jeb...@gmail.com> wrote:

> Could it be possible to create some kind of info (notification?) in a
> wikipedia article that additional data is available in a queue ("freebase")
> somewhere?
>
> If you have the article on your watch-list, then you will get a warning
> that says "You lazy boy, get your ass over here and help us out!" Or
> perhaps slightly rephrased.
>
> On Mon, Sep 28, 2015 at 4:52 PM, Markus Krötzsch <
> mar...@semantic-mediawiki.org> wrote:
>
>> Hi Gerard, hi all,
>>
>> The key misunderstanding here is that the main issue with the Freebase
>> import would be data quality. It is actually community support. The goal of
>> the current slow import process is for the Wikidata community to "adopt"
>> the Freebase data. It's not about "storing" the data somewhere, but about
>> finding a way to maintain it in the future.
>>
>> The import statistics show that Wikidata does not currently have enough
>> community power for a quick import. This is regrettable, but not something
>> that we can fix by dumping in more data that will then be orphaned.
>>
>> Freebase people: this is not a small amount of data for our young
>> community. We really need your help to digest this huge amount of data! I
>> am absolutely convinced from the emails I saw here that none of the former
>> Freebase editors on this list would support low quality standards. They
>> have fought hard to fix errors and avoid issues coming into their data for
>> a long time.
>>
>> Nobody believes that either Freebase or Wikidata can ever be free of
>> errors, and this is really not the point of this discussion at all [1]. The
>> experienced community managers among us know that it is not about the
>> amount of data you have. Data is cheap and easy to get, even free data with
>> very high quality. But the value proposition of Wikidata is not that it can
>> provide storage space for lot of data -- it is that we have a functioning
>> community that can maintain it. For the Freebase data donation, we do not
>> seem to have this community yet. We need to find a way to engage people to
>> do this. Ideas are welcome.
>>
>> What I can see from the statistics, however, is that some users (and I
>> cannot say if they are "Freebase users" or "Wikidata users" ;-) are putting
>> a lot of effort into integrating the data already. This is great, and we
>> should thank these people because they are the ones who are now working on
>> what we are just talking about here. In addition, we should think about
>> ways of engaging more community in this. Some ideas:
>>
>> (1) Find a way to clean and import some statements using bots. Maybe
>> there are cases where Freebase already had a working import infrastructure
>> that could be migrated to Wikidata? This would also solve the community
>> support problem in one way. We just need to import the maintenance
>> infrastructure together with the data.
>>
>> (2) Find a way to expose specific suggestions to more people. The
>> Wikidata Games have attracted so many contributions. Could some of the
>> Freebase data be solved in this way, with a dedicated UI?
>>
>> (3) Organise Freebase edit-a-thons where people come together to work
>> through a bunch of suggested statements.
>>
>> (4) Form wiki projects that discuss a particular topic domain in Freebase
>> and how it could be imported faster using (1)-(3) or any other idea.
>>
>> (5) Connect to existing Wiki projects to make them aware of valuable data
>> they might take from Freebase.
>>
>> Freebase is a much better resource than many other data re

[Wikidata] I know this is AbsolutlyWrong™

2015-09-28 Thread John Erling Blad
...but don't focus to much on the 1% #¤%& wrong thing, focus on the 99%
right thing.

And I do think Wikibase is done 99% right!
(And the 1% WrongThing™ is just there so I can nag Danny and Duesentrieb...)

John
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Italian Wikipedia imports gone haywire ?

2015-09-28 Thread John Erling Blad
I would like to add "minister", as there are some fine distinctions on who
is and who's not in a government. Still we call them all "ministers". Very
confusing, and very obvious at the same time.

There are also the differences in organisation of American municipalities,
oh what a glorious mess!

Then you have the differences between a state and its different
main/bi-lands, not to say the not inhabited parts.

It is a lot that don't have an obvious description.

And btw, twin cities, I found a lot of errors and pretended I didn't see
them. Don't tell anyone.

On Mon, Sep 28, 2015 at 4:04 PM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 28.09.2015 13:31, Luca Martinelli wrote:
>
>> 2015-09-28 11:16 GMT+02:00 Markus Krötzsch > >:
>>
>>> If this is the case, then maybe it
>>> should just be kept as an intentionally broad property that captures
>>> what we
>>> now find in the Wikipedias.
>>>
>>
>> +1, the more broad the application of certain property is, the better.
>> We really don't need to be 100% specific with a property, if we can
>> exploit qualifiers.
>>
>
> I would not completely agree to this: otherwise we could just have a
> property "related to" and use qualifiers for the rest ;-) It's always about
> finding the right balance for each case. Many properties (probably most)
> have a predominant natural definition that is quite clear. Take "parent" as
> a simple example of a property that can have a very strict definition
> (biological parent) and still be practically useful and easy to understand.
> The trouble is often with properties that have a legal/political meaning
> since they are different in each legislation (which in itself changes over
> space and time). "Twin city" is such a case; "mayor" is another; also
> classes like "company" are like this. I think we do well to stick to the
> "folk terminology" in such cases, which lacks precision but caters to our
> users.
>
> This can then be refined in the mid and long term (maybe using qualifiers,
> more properties, or new editing conventions). Each domain could have a
> dedicated Wikiproject to work this out (the Wikiproject Names is a great
> example of such an effort [1]).
>
> Markus
>
> [1] https://www.wikidata.org/wiki/Wikidata:WikiProject_Names
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] I know this is AbsolutlyWrong™

2015-09-28 Thread John Erling Blad
Just so there are no misunderstandings, I'm right when you are wrong!
That closes the 1% gap.

CaseClosed©

On Mon, Sep 28, 2015 at 11:49 PM, Daniel Kinzler <
daniel.kinz...@wikimedia.de> wrote:

> Awww, thanks John, I think I needed that :)
>
> Am 28.09.2015 um 22:29 schrieb John Erling Blad:
> > ...but don't focus to much on the 1% #¤%& wrong thing, focus on the 99%
> right thing.
> >
> > And I do think Wikibase is done 99% right!
> > (And the 1% WrongThing™ is just there so I can nag Danny and
> Duesentrieb...)
> >
> > John
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Naming projects

2015-09-14 Thread John Erling Blad
There is no consensus at nowiki on use of this bot, it has not been raised,
and the name as such has neither been discussed either. My comment on the
name is solely my opinion. I'm not interested in whether it is a pun or
not, it has a clear and direct connection to a disease that has killed
people and I don't like it.

The bot has other issues, but that is another discussion.

On Mon, Sep 14, 2015 at 10:48 AM, Mathias Schindler <
mathias.schind...@gmail.com> wrote:

> On Mon, Sep 14, 2015 at 12:23 AM, John Erling Blad <jeb...@gmail.com>
> wrote:
> > Please do not name projects "listeria", or use any other names of
> diseases
> > that has killed people. Thank you for taking this into consideration next
> > time.
>
> Hi John,
>
> given that the premise of your email is factually incorrect (as
> listeria is a pun referring to lists, sharing the name with a family
> of bacteria named after Joseph Lister (according to Wikipedia)), is
> there now consensus that the is no issue with this bot name and that
> the recommendation only applies to picking names in general without
> any assertion that this wasn't duly taken into consideration this
> time?
>
> Mathias
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Translation of ValueView

2015-09-14 Thread John Erling Blad
Okey, I'll tell the community to hang on for a while!
Do you have any date for the transfer, I think they are eager to start
translating.

John

On Mon, Sep 14, 2015 at 9:44 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Mon, Sep 14, 2015 at 9:37 PM, John Erling Blad <jeb...@gmail.com>
> wrote:
> > Can someone point me to where ValueView is translated, thanks!
> > https://github.com/wmde/ValueView
>
> We are in the process of moving that repository. Once that is done it
> will be translated on translatewiki.
> https://phabricator.wikimedia.org/T112120
>
>
> Cheers
> Lydi
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Translation of ValueView

2015-09-14 Thread John Erling Blad
Can someone point me to where ValueView is translated, thanks!
https://github.com/wmde/ValueView
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] next 2 rounds of arbitrary access rollouts

2015-09-04 Thread John Erling Blad
Please tell us if there will be further delays! :)
John

On Wed, Sep 2, 2015 at 1:43 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Wed, Aug 19, 2015 at 3:15 PM, Lydia Pintscher
>  wrote:
> > Hi everyone,
> >
> > Update: we wanted to do the second batch last night but ran into some
> > issues we need to investigate more first before we can add another
> > huge wiki like enwp. Sorry for the delay. I'll keep you posted.
>
> We had to delay the next rollout. We now solved the issues and have
> set a new date. It'll be on September 16th. ENWP we're coming :)
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Data from Netherlands Statistics / CBS on Wikidata?

2015-08-08 Thread John Erling Blad
There have been some discussions about reuse of statistics with people
from Statistics Norway.[1] They use a format called JSON-stat.[2] A
bunch of census bureaus are starting to use JSON-stat, for example
Statistics Norway, UK’s Office for National Statistics, Statistics
Sweden, Statistics Denmark, Instituto Galego de Estatística, and
Central Statistics Office of Ireland. I've heard about other too.

I have started on some rant at Meta about it, I didn't finish it.[3]
Perhaps more people will join in? ;)

A central problem is that statistics are often produced as a
multidimensional dataset, where our topics are only single indices on
one of the dimensions. We can extract the relevant data, but it is
probably better to make a kind of composite key into the dataset to
identify the relevant stuff about our topic. That key can be stored as
a table-specific statement in Wikidata, and with a little bit of
planning it can be statistics-specific or even bureau-specific.

[1] https://ssb.no/en/
[2] http://json-stat.org/
[3] 
https://meta.wikimedia.org/wiki/Grants:IdeaLab/Import_and_visualize_census_data

On Fri, Aug 7, 2015 at 3:06 PM, Joe Filceolaire filceola...@gmail.com wrote:
 Hypercubes and csv flat files belong in commons in my opinion  (commons may
 have a different opinion ). That's if we even want to store a copy.

 This source data should then be translated into wikidata triples and
 statements and imported into wikidata items.

 The statements in wikidata are then used to generate lists and tables and
 graphs and info graphics in wikipedia.

 At least that's how I see it

 Joe


 On Thu, 6 Aug 2015 17:00 Jane Darnell jane...@gmail.com wrote:

 I have used the CBS website to compile my own statistics for research.
 Their data is completely available online as far as I know and you can
 download the queries you run on the fly in .csv file format, or text or
 excel. They have various data tables depending on what you find interesting
 and complete tables of historical data is also available. That said, I think
 any pilot project would need to start with their publications, which are
 also available online. These can be freely used as sources for statements.
 Interesting data for Wikidata could be population statistics of major cities
 per century or employment statistics per city per century and so forth. See
 CBS.nl

 On Thu, Aug 6, 2015 at 5:30 PM, Gerard Meijssen
 gerard.meijs...@gmail.com wrote:

 Hoi,
 As far as I am concerned, data that is available on the web is fine if
 you use data on the web. It makes no difference when the data is to be used
 in the context of the WMF.

 When the CBS shares data with us in Wikidata, it makes the data available
 in Wikipedia.

 It is why I would like for something small, a pilot project something
 where we can build on.
 Thanks,
  GerardM

 On 6 August 2015 at 17:23, Thad Guidry thadgui...@gmail.com wrote:

 Netherlands Statistics should just post the data on the web...so that
 anyone can use its Linked Data.

 And actually, CSV on the Web is now a reality (no longer a need for
 XBRL)

 https://lists.w3.org/Archives/Public/public-vocabs/2015Jul/0016.html

 As DanBri notes in his P.S. at the bottom of the above link.. the
 ability (in the csv2rdf doc) to map from

 rows in a table via templates into RDF triplesis very powerful.

 Thad
 +ThadGuidry

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata



 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata


 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata


 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Maintenance scripts for clients

2015-08-04 Thread John Erling Blad
We lack several maintenance scripts for the clients, that is human
readable special pages with reports on which pages lacks special
treatment. In no particular order we need some way to identify
unconnected pages in general (the present one does not work [1]), we
need some way to identify pages that are unconnected but has some
language links, we need to identify items that are used in some
language and lacks labels (almost like [2],but on the client and for
items that are somehow connected to pages on the client), and we need
to identify items that lacks specific claims and the client pages use
a specific template.

There are probably more such maintenance pages, these are those that
are most urgent. Now users start to create categories to hack around
the missing maintenance pages, which create a bunch of categories.[3]
At Norwegian Bokmål there are just a few scripts that utilize data
from Wikidata, still the number of categories starts to grow large.

For us at the receiving end this is a show stopper. We can't
convince the users that this is a positive addition to the pages
without the maintenance scripts, because them we more or less are in
the blind when we try to fix errors. We can't use random pages to try
to prod the pages to find something that is wrong, we must be able to
search for the errors and fix them.

This summer we (nowiki) have added about ten (10) properties to the
infobokses, some with scripts and some with the property parser
function. Most of my time I have not been coding, and I have not been
fixing errors. I have been trying to explain to the community why
Wikidata is a good idea. At one point the changes was even reverted
because someone disagree with what we had done. The whole thing
basically revolves around my article got an Q-id in the infobox and I
don't know how to fix it. We know how to fix it, and I have explained
that to the editors at nowiki several times. They still don't get it,
so we need some way to fix it, and we don't have maintenance scripts
to do it.

Right now we don't need more wild ideas that will swamp the
development for months and years to come, we need maintenance scripts,
and we need them now!

[1] https://no.wikipedia.org/wiki/Spesial:UnconnectedPages
[2] https://www.wikidata.org/wiki/Special:EntitiesWithoutLabel
[3] https://no.wikipedia.org/wiki/Spesial:Prefiksindeks/Kategori:Artikler_hvor

John Erling Blad
/jeblad

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] property label/alias uniqueness

2015-07-13 Thread John Erling Blad
No we should not make the aliases unique, the reason aliases are
useful is because they are _not_ unique.
Add versioning to labels, that is the only real solution.

There are books on the topic, and also some dr thesis. I don't think
we should create anything ad hoc for this. Go for a proven solution.

On Mon, Jul 13, 2015 at 3:24 PM, Daniel Kinzler
daniel.kinz...@wikimedia.de wrote:
 Am 13.07.2015 um 13:00 schrieb Ricordisamoa:
 I agree too.
 Also note that property IDs are language-neutral, unlike english names of
 templates, magic words, etc.

 As I said: if there is broad conseus to only use P-numbers to refer to
 properties, fine with me (note however that Lydia disagrees, and it's her
 decision). I like the idea of having the option of accessing properties via
 localized names, but if there is no demand for this possibility, and it's a 
 pain
 to implement, I won't complain about dropping support for that.

 But *if* we allow access to properties via localized unique labels (as we
 currently do), then we really *should* allow the same via unique aliases, so
 property labels can be chanegd without breaking stuff.

 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] property label/alias uniqueness

2015-07-08 Thread John Erling Blad
Another way to formulate this is that The labels are our name for the
property and we can force them to be unique, while the aliases are
other peoples names for similar properties and as we can't control
them they won't be unique.

On Wed, Jul 8, 2015 at 1:43 PM, John Erling Blad jeb...@gmail.com wrote:
 We will get clashes between different ontologies, can't see how we can
 avoid that. Our label should be unique, but not aliases. We use
 aliases as a way to access something that we later must disambiguate.
 We should not have a uniqueness constraint on aliases, it simply makes
 no sense.

 On Wed, Jul 8, 2015 at 1:23 PM, Daniel Kinzler
 daniel.kinz...@wikimedia.de wrote:
 Am 08.07.2015 um 13:11 schrieb Gerard Meijssen:
 Technically there is no problem disambiguating. People are really good
 understanding what a property means based on context. Machines do not care 
 for
 labels (really)..

 For items, that is exactly hgow it is. For properties however, that is not 
 the case.

 Consider {{#property:date of birth}}. That's much more readable than
 {{#property:P569}}, right? That's why properties can be *addressed* by their
 label, when transcluding data into wikitext. Properties have unique *names* 
 by
 which they can be *used*, not just labels for display, like items do.

 The problem we have is that you cannot change a propertie's label, because 
 you
 would break usage in {{#property}} calls. Unless you keep the old label as an
 alias. Which can only work if the alias is unique, too.

 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] property label/alias uniqueness

2015-07-08 Thread John Erling Blad
We will get clashes between different ontologies, can't see how we can
avoid that. Our label should be unique, but not aliases. We use
aliases as a way to access something that we later must disambiguate.
We should not have a uniqueness constraint on aliases, it simply makes
no sense.

On Wed, Jul 8, 2015 at 1:23 PM, Daniel Kinzler
daniel.kinz...@wikimedia.de wrote:
 Am 08.07.2015 um 13:11 schrieb Gerard Meijssen:
 Technically there is no problem disambiguating. People are really good
 understanding what a property means based on context. Machines do not care 
 for
 labels (really)..

 For items, that is exactly hgow it is. For properties however, that is not 
 the case.

 Consider {{#property:date of birth}}. That's much more readable than
 {{#property:P569}}, right? That's why properties can be *addressed* by their
 label, when transcluding data into wikitext. Properties have unique *names* by
 which they can be *used*, not just labels for display, like items do.

 The problem we have is that you cannot change a propertie's label, because you
 would break usage in {{#property}} calls. Unless you keep the old label as an
 alias. Which can only work if the alias is unique, too.

 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] property label/alias uniqueness

2015-07-08 Thread John Erling Blad
You asked for an example, end those are valid examples. It is even an
example that use one of the most used ontologies on the net. Other
examples from DCterms is coverage, which can be both temporal and
spatial. We have a bunch of properties that can have an alias DCterms
coverage, a country for example or a year.

Use a separate list of deferred labels, and put the existing label
on that if someone tries to edit the defined (preferred) label. That
list should be unique, as it should not be possible to save a new
label that already exist on the list of deferred labels. At some
future point in time it can be implemented some clean up routine, but
I think it will take a long time before name clashes will be a real
problem.

At some point I think we should seriously consider to use SKOS Simple
Knowledge Organization System
Reference http://www.w3.org/TR/skos-reference/

On Wed, Jul 8, 2015 at 4:21 PM, Daniel Kinzler
daniel.kinz...@wikimedia.de wrote:
 Am 08.07.2015 um 14:13 schrieb John Erling Blad:
 What you want is closer to a redirect than an alias, while an alias is
 closer to a disambiguation page.

 Yes. The semantics of labels on properties is indeed different from labels on
 items, and always has been. Property labels are defined to be unique names.
 Extending the uniqueness to aliases allows them, to qact as redirects, which
 allow use to rename or move properties (that is, change their label).

 Using (unique) aliases for this seems the simplest solution. Introducing 
 another
 kind-of-aliases would be confusing in the UI as well as in the data model, and
 wopuld require a lot more code, which is nearly exactly the same as for 
 aliases.

 I don't follow your example with the DC vocabulary. For the height, width and
 length properties, why would one want an alias that is the same for all of 
 them?
 What would that be useful for?


 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] calendar model screwup

2015-07-01 Thread John Erling Blad
That should be default calendar model.
My screw up... ;/

ons. 1. jul. 2015, 20.08 skrev John Erling Blad jeb...@gmail.com:

 Wouldn't it be better to use iso8601 as internal format?

 ons. 1. jul. 2015, 18.45 skrev Markus Krötzsch 
 mar...@semantic-mediawiki.org:

 On 01.07.2015 18:14, Peter F. Patel-Schneider wrote:
  On 07/01/2015 07:00 AM, Pierpaolo Bernardi wrote:
  On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch
  mar...@semantic-mediawiki.org wrote:
  Dear Pierpaolo,
 
  This thread was only about Julian and Gregorian calendar dates. If and
  how other calendar models should be supported in some future is
  another (potentially big) discussion. As you said, there are many
  issues there. Let's first make sure that we handle the easy 99.9% of
  cases correctly before discussing any more complicated options.
 
  Lydia Pintscher in the starting email explained that there's a model
 for
  calendars, and unfortunately this model could be (and has been)
  interpreted in two ways (AFAIU).
 
  My intention was to point out that one of the two interpretations is
 not
  sound.  This leaves the other one as the only viable one.
 
  Cheers P.
 
  It appears (from the email only---there are no pointers to enduring
  documentation on the solution that are attached to the relevant classes
 or
  poperties) that the chosen method is to store dates in both the source
  calendar and the proleptic Gegorian calendar
  (
 https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup
 ).
  As you point out, this is not a viable solution for calendars whose
 days do
  not start at the same time as days in the proleptic Gegorian calendar
  (unless, of course, there is time and location information also
 available).

 The Wikidata date implementation intentionally restricts to dates that
 are compatible with the Gregorian calendar. Although the system refers
 to Wikidata item ids of calendar models to denote Proleptic Gregorian
 and Proleptic Julian, the system does not allow users or bots to enter
 arbitrary items as calendar model.

 My understanding (and the implementation in WDTK) is that all dates are
 provided in Gregorian calendar with a calendar model that specifies how
 they should be displayed (if possible). The date in the source calendar
 is for convenience and maybe for technical reasons on the side of the
 PHP implementation. At no time should the source calendar date be
 impossible to convert to Gregorian. We have had extensive discussions
 about this point -- Gregorian must remain the main format at all times.

 This does not mean that we cannot have more models in the future. There
 is (currently unused) timezone information, which can be used to store
 offsets. Once fully implemented, this might allow exact conversion from
 calendar models that have another start for their days. So maybe this is
 not a case of real incompatibility. However, the timezone support for
 current dates needs to be finished before discussing the next steps into
 more exotic calendars.

 Best regards,

 Markus


 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] calendar model screwup

2015-07-01 Thread John Erling Blad
Wouldn't it be better to use iso8601 as internal format?

ons. 1. jul. 2015, 18.45 skrev Markus Krötzsch 
mar...@semantic-mediawiki.org:

 On 01.07.2015 18:14, Peter F. Patel-Schneider wrote:
  On 07/01/2015 07:00 AM, Pierpaolo Bernardi wrote:
  On Wed, Jul 1, 2015 at 8:17 AM, Markus Krötzsch
  mar...@semantic-mediawiki.org wrote:
  Dear Pierpaolo,
 
  This thread was only about Julian and Gregorian calendar dates. If and
  how other calendar models should be supported in some future is
  another (potentially big) discussion. As you said, there are many
  issues there. Let's first make sure that we handle the easy 99.9% of
  cases correctly before discussing any more complicated options.
 
  Lydia Pintscher in the starting email explained that there's a model for
  calendars, and unfortunately this model could be (and has been)
  interpreted in two ways (AFAIU).
 
  My intention was to point out that one of the two interpretations is not
  sound.  This leaves the other one as the only viable one.
 
  Cheers P.
 
  It appears (from the email only---there are no pointers to enduring
  documentation on the solution that are attached to the relevant classes
 or
  poperties) that the chosen method is to store dates in both the source
  calendar and the proleptic Gegorian calendar
  (
 https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup
 ).
  As you point out, this is not a viable solution for calendars whose days
 do
  not start at the same time as days in the proleptic Gegorian calendar
  (unless, of course, there is time and location information also
 available).

 The Wikidata date implementation intentionally restricts to dates that
 are compatible with the Gregorian calendar. Although the system refers
 to Wikidata item ids of calendar models to denote Proleptic Gregorian
 and Proleptic Julian, the system does not allow users or bots to enter
 arbitrary items as calendar model.

 My understanding (and the implementation in WDTK) is that all dates are
 provided in Gregorian calendar with a calendar model that specifies how
 they should be displayed (if possible). The date in the source calendar
 is for convenience and maybe for technical reasons on the side of the
 PHP implementation. At no time should the source calendar date be
 impossible to convert to Gregorian. We have had extensive discussions
 about this point -- Gregorian must remain the main format at all times.

 This does not mean that we cannot have more models in the future. There
 is (currently unused) timezone information, which can be used to store
 offsets. Once fully implemented, this might allow exact conversion from
 calendar models that have another start for their days. So maybe this is
 not a case of real incompatibility. However, the timezone support for
 current dates needs to be finished before discussing the next steps into
 more exotic calendars.

 Best regards,

 Markus


 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] calendar model screwup

2015-06-30 Thread John Erling Blad
I may have said this before; it is very easy to get things screwed up when
a value must reference a datum (a calendar is a datum for time). I think
this is perhaps one of the most common errors on Wikipedia, we just assume
there is a single global datum. Usually it is not, it is only a matter of
precission in the number and some datum-error bites you in the rumpf!

Big thanks to Lydia and the team for providing an explanation, it makes it
much easier to fix it. Don't waste to much time on how it happen, it is far
more important to figure out how to fix it.

Keep up the good work, and don't forget to report failures. Thats how we
all learn.

John Erling Blad
/jeblad

(Okey I do laugh a little, but not very much! [?])

On Tue, Jun 30, 2015 at 9:21 PM, Joe Filceolaire filceola...@gmail.com
wrote:

 Can I just ask all of you who want to demand an enquiry as to how this
 happened to hold off until the problem has been fixed

 Please

 No post mortem while the patient is still alive

 Joe

 On Tue, 30 Jun 2015 18:39 Lydia Pintscher lydia.pintsc...@wikimedia.de
 wrote:

 Hi everyone,

 I have some bad news. We screwed up. I’m really sorry about this. I’d
 really appreciate everyone’s help with fixing it.

 TLDR: We have a bad mixup of calendar models for the dates in Wikidata
 and we need to fix them.

  What happened? 
 Wikidata dates have a calendar model. This can be Julian or Gregorian
 and the plan is to support more in the future. There are two ways to
 interpret this calendar model:
 # the given date is in this calendar model
 # the given date is Gregorian and this calendar model says if the date
 should be displayed in Gregorian or Julian in the user interface

 Unfortunately both among the developers as well as bot operators there
 was confusion about which of those is to be used. This lead to
 inconsistencies in the backend/frontend code as well as different bot
 authors treating the calendar model differently. In addition the user
 interface had problematic defaults. We now have a number of dates with
 a potentially wrong calendar model. The biggest issue started when we
 moved code from the frontend to the backend in Mid 2014 in order to
 improve performance. Prior to the move, the user interface used to
 make the conversion from one model to the other. After the move, the
 conversion was not done anywhere anymore - but the calendar model was
 still displayed. We made one part better but in the process broke
 another part badly :(

  What now? 
 * Going forward the date data value will be given in both the
 normalized proleptic Gregorian calendar as well as in the calendar
 model explicitly given (which currently supports, as said, proleptic
 Gregorian and proleptic Julian).
 * The user interface will again indicate which calendar model the date
 is given in. We will improve documentation around this to make sure
 there is no confusion from now on.
 * We made a flowchart to help decide what the correct calendar model
 for a date should be to help with the clean up.
 * We are improving the user interface to make it easier to understand
 what is going on and by default do the right thing.
 * We are providing a list of dates that need to be checked and
 potentially fixed.
 * How are we making sure it doesn’t happen again?
 * We are improving documentation around dates and will look for other
 potential ambiguous concepts we have.

  How can we fix it? 
 We have created a list of all dates that potentially need checking. We
 can either provide this as a list on some wiki page or run a bot to
 add “instance of: date needing calendar model check“ or something
 similar as a qualifier to the respective dates. What do you prefer?
 The list probably contains dates we can batch-change or approve but
 we’d need your help with figuring out which those are.
 We also created a flowchart that should help with making the decision
 which calendar model to pick for a given date:

 https://commons.wikimedia.org/wiki/File:Wikidata_Calendar_Model_Decision_Tree.svg

 Thank you to everyone who helped us investigate and get to the bottom
 of the issue. Sorry again this has happened and is causing work. I
 feel miserable about this and if there is anything more we can do to
 help with the cleanup please do let me know.


 Let's please keep further discussion about this in one place on-wiki
 at
 https://www.wikidata.org/wiki/Wikidata:Project_chat#calendar_model_screwup


 Cheers
 Lydia

 --
 Lydia Pintscher - http://about.me/lydia.pintscher
 Product Manager for Wikidata

 Wikimedia Deutschland e.V.
 Tempelhofer Ufer 23-24
 10963 Berlin
 www.wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikidata mailing list

Re: [Wikidata] University automatically being College alias ?

2015-06-15 Thread John Erling Blad
There are a lot of places called Parma, and it is not obvious which
one should be listed first. Perhaps Parma in Tibet?

This is actually a problem that can't be easily solved. Parma is
interpreted in a cultural context, and the Italian city is just one of
several places called the same. It might be obvious for an Italian
that Parma is an Italian city, but it is equally obvious for someone
from Tibet? What about Parma, Ohio, more than 80.000 people live
there?

The solution is to use a set of standardized user models, typically
they will follow the language regions.

On Mon, Jun 15, 2015 at 5:45 PM, Federico Leva (Nemo)
nemow...@gmail.com wrote:
 John Erling Blad, 15/06/2015 17:00:

 If I search for Oslo and live in Norway it is highly likely that I
 want the article about the city in Norway. If I live in Marshall
 County, Minnesota, it is not so obvious that I want the city in Norway
 to be ranked first.


 If Chinese really create a city call Parma to sell more prosciutto, I want
 Chinese users to be given the real Parma first always. :)

 Nemo


 ___
 Wikidata mailing list
 Wikidata@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata-l] Wikidata for Wiktionary

2015-05-16 Thread John Erling Blad
Your description is pretty far from whats in the proposal right now.
The proposal is not clear at all, so I would say update it and
resubmit if for a new discussion.

On Sat, May 16, 2015 at 12:21 PM, Daniel Kinzler
daniel.kinz...@wikimedia.de wrote:
 Am 15.05.2015 um 01:11 schrieb John Erling Blad:
 How do we go from a spelled form of a lexeme at Wiktionary and to an
 identifier on Wikidata?

 What do you mean by go to? And what do you mean by identifier on Wikidata 
 -
 Items, Lexemes, Senses, or Forms?

 Generally, Wiktionary currently combines words with the same rendering from
 different languages on a single page. So a single Wiktionary page would
 correspond to several Lexeme entries on Wikidata, since Lexemes on wikidata
 would be split per language.

 I suppose a Lexeme-Entry could be linked back to the corresponding pages on 
 the
 various Wiktionaries, but I don't really see the value of that, and sitelinks
 are currently not planned for Lexeme entries. It probably makes more sense for
 the Wiktionary pages to explicitly reference the Wikidata-Lexeme that
 corresponds to each language-section on the page.

 And how do we go from one Sense to another
 synonym Sense? Do we use statements? But then only the L-identifiers
 can be used, so we will link them at the Lexeme level..

 Why can only L-Identifiers be used? Senses (and Forms) are entities and have
 identifiers. They wouldn't have a wiki-page of their own, but that's not a
 problem. The intention is that it's possible for one Sense to have a statement
 referring directly to another Sense (of the same or a different Lexeme).

 Wiktionary is organized around homonyms while Wikipedia is organized
 around synonyms, especially across languages, and I think this
 difference creates some of the problems.

 The Lexeme-Part of Wikidata (L-ids) would be separate from the Concept-part of
 Wikidata (Q-ids). The Lexeme part is organized around homonyms (more 
 precisely,
 homographs in a single language). Each Lexeme can have several Senses 
 modeled
 as sub-entities, meaning that each Sense has its own set of Statements. Each
 Sense can be linked to Senses of other Lexemes (explicit synonyms or
 translations) and to Q-id concepts (implicit synonyms or translations) using
 Statements.


 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Using Special:Unconnected Pages? Please read.

2015-05-15 Thread John Erling Blad
...and one additional thing; use of this page is client side, not
server side. That imply that the communities at the client side should
be asked, not the community at the server side. Which means that this
list is the wrong forum.

John

On Fri, May 15, 2015 at 2:58 PM, John Erling Blad jeb...@gmail.com wrote:
 That would effectively make the output quasirandom, make paging
 confusing, and look up of specific pages and following pages
 impossible. In all I think this makes the page useless for anybody
 except those projects that has managed to clean up the remaining
 unconnected pages, and those that has less than 5000 (I think that is
 the limit) pages in the list.

 I would rather suggest that an optional category should be added, and
 that it should be mandatory if the namespace page count indicates that
 the number of pages goes above some limit.

 I think the start was implemented as a prefix search originally, but I
 wonder if that is still the case. It could be wise to check it out. It
 could be an idea to use a default prefix search if none is added.

 If neither a prefix pattern or a prefix search are used, then a
 page-id sorted first N hits can be returned. It should also be
 possible to switch between oldest first and newest first.

 The seemingly caching behavior is probably the update which arrives
 late, but it could be an indication of other issues. The same bug was
 fixed several years ago, and disappeared, so it can be a new bug.

 It is interesting that there are now real needs for caching and an
 API, the page itself wasn't much appreciated and deemed unnecessary
 when it was created. This it has in common with a lot of the
 maintenance pages, the editors use them even if they are really
 crappy. Other special pages are using old, often several days old,
 reports as their source. When it comes to maintenance reports they
 should be up to date and actionable, not outdated and questionable.

 /rant

 John

 On Fri, May 15, 2015 at 2:07 PM, Lydia Pintscher
 lydia.pintsc...@wikimedia.de wrote:
 Thanks so much for the feedback. That was useful. We'll rework the
 page to make it not time-out as it currently does on larger wikis. The
 way we'll go for to achieve this is by sorting the pages by page-id
 rather than page-title. That should also make it relatively easy to
 find the newest or oldest pages depending on which you're working on.
 We'll also provide a filter for the namespace but that might come a
 bit later.


 Cheers
 Lydia

 --
 Lydia Pintscher - http://about.me/lydia.pintscher
 Product Manager for Wikidata

 Wikimedia Deutschland e.V.
 Tempelhofer Ufer 23-24
 10963 Berlin
 www.wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Using Special:Unconnected Pages? Please read.

2015-05-15 Thread John Erling Blad
Thanks! What you are saying is that you don't want any feedback.

On Fri, May 15, 2015 at 3:07 PM, Lydia Pintscher
lydia.pintsc...@wikimedia.de wrote:
 On Fri, May 15, 2015 at 3:01 PM, John Erling Blad jeb...@gmail.com wrote:
 ...and one additional thing; use of this page is client side, not
 server side. That imply that the communities at the client side should
 be asked, not the community at the server side. Which means that this
 list is the wrong forum.

 John, I got the answers I needed from the people who actually use the
 page and we're making it more useful for them based on this feedback
 within the technical constraints we have right now.


 Cheers
 Lydia

 --
 Lydia Pintscher - http://about.me/lydia.pintscher
 Product Manager for Wikidata

 Wikimedia Deutschland e.V.
 Tempelhofer Ufer 23-24
 10963 Berlin
 www.wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata for Wiktionary

2015-05-14 Thread John Erling Blad
Seems like this is doable, and it does describe a solution to how
Wiktionary can be linked form Wikidata. It is although not completely
clear to me how some remaining problems can be solved.

How do we go from a spelled form of a lexeme at Wiktionary and to an
identifier on Wikidata? And how do we go from one Sense to another
synonym Sense? Do we use statements? But then only the L-identifiers
can be used, so we will link them at the Lexeme level..

Wiktionary is organized around homonyms while Wikipedia is organized
around synonyms, especially across languages, and I think this
difference creates some of the problems.

On Fri, May 15, 2015 at 12:36 AM, John Erling Blad jeb...@gmail.com wrote:
 Yes, found a sentence in task 2. :)

 On Fri, May 15, 2015 at 12:34 AM, Daniel Kinzler
 daniel.kinz...@wikimedia.de wrote:
 Am 14.05.2015 um 23:54 schrieb John Erling Blad:
 Let me rephrase, and the question is for Denny unless someone knows the 
 answer.

 Lexemes at different languages share a spelling, and that is the
 reason why they are linked together. That kind of linkage can be
 automated. Some other pages (usually in other namespaces) at those
 projects should be linked too, but can't be handled automatically.
 Would they be handled as sitelinks in Items?

 Yes, I'd assume so.


 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata for Wiktionary

2015-05-14 Thread John Erling Blad
Let me rephrase, and the question is for Denny unless someone knows the answer.

Lexemes at different languages share a spelling, and that is the
reason why they are linked together. That kind of linkage can be
automated. Some other pages (usually in other namespaces) at those
projects should be linked too, but can't be handled automatically.
Would they be handled as sitelinks in Items?

John

On Thu, May 14, 2015 at 5:59 PM, Gerard Meijssen
gerard.meijs...@gmail.com wrote:
 Hoi,
 From a Wiktionary point of view they are not the same.  Wiktionary links
 articles that have the same spelling in common. For every meaning in every
 language they link to the articles that have a specific spelling and it is
 potluck if that meaning actually exists.
 Thanks,
  GerardM

 On 14 May 2015 at 16:49, John Erling Blad jeb...@gmail.com wrote:

 As I read your proposal you want to automate IW-linkage of similar
 lexemes, but how do you want to handle those cases where the lexemes
 are not similar? Your example the tea room vs le questions sur let
 mots is such a case. Is this handled as a mixed automatic/manuel
 case, with lexemes added automatically and the additional ones added
 manually?

 Can you elaborate on how you want to handle word form vs word sense?

 John

 On Thu, May 7, 2015 at 4:54 AM, Denny Vrandečić vrande...@gmail.com
 wrote:
  It is rather clear that everyone wants Wikidata to also support
  Wiktionary,
  and there have been plenty of proposals in the last few years. I think
  that
  the latest proposals are sufficiently similar to go for the next step: a
  break down of the tasks needed to get this done.
 
  Currently, the idea of having Wikidata supporting Wiktionary is stalled
  because it is regarded as a large monolithic task, and as such it is
  hard to
  plan and commit to. I tried to come up with a task break-down, and
  discussed
  it with Lydia and Daniel, and now, as said in the last office hour, here
  it
  is for discussion and community input.
 
 
  https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015-05
 
  I think it would be really awesome if we would start moving in this
  direction. Wiktionary supported by Wikidata could quickly become one of
  the
  crucial pieces of infrastructure for the Web as a whole, but in
  particular
  for Wikipedia and its future development.
 
  Cheers,
  Denny
 
  ___
  Wikidata-l mailing list
  Wikidata-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikidata-l
 

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l



 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata for Wiktionary

2015-05-14 Thread John Erling Blad
Yes, found a sentence in task 2. :)

On Fri, May 15, 2015 at 12:34 AM, Daniel Kinzler
daniel.kinz...@wikimedia.de wrote:
 Am 14.05.2015 um 23:54 schrieb John Erling Blad:
 Let me rephrase, and the question is for Denny unless someone knows the 
 answer.

 Lexemes at different languages share a spelling, and that is the
 reason why they are linked together. That kind of linkage can be
 automated. Some other pages (usually in other namespaces) at those
 projects should be linked too, but can't be handled automatically.
 Would they be handled as sitelinks in Items?

 Yes, I'd assume so.


 --
 Daniel Kinzler
 Senior Software Developer

 Wikimedia Deutschland
 Gesellschaft zur Förderung Freien Wissens e.V.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Various questions

2014-11-12 Thread John Erling Blad
There are some people with very strong beliefs about certain numbers,
like 13 and 666. Is there anyone that has complained about an assigned
number on Wikidata?

On Tue, Nov 11, 2014 at 10:51 PM, Denny Vrandečić vrande...@google.com wrote:


 On Tue Nov 11 2014 at 1:51:32 PM Denny Vrandečić vrande...@google.com
 wrote:



 On Tue Nov 11 2014 at 1:51:08 PM Denny Vrandečić vrande...@google.com
 wrote:

 +1 for removing the blacklist from the code.

 On Tue Nov 11 2014 at 12:28:05 AM John Erling Blad jeb...@gmail.com
 wrote:

 What did I say, etc, etc, etc... It feels good to be right. I was
 right. Me. I and myself.
 Some stuff always bites you, even if it was quite fun! ;)

 On Tue, Nov 11, 2014 at 9:09 AM, Jeroen De Dauw jeroended...@gmail.com
 wrote:
  Hey,
 
  I was looking through the configuration trying to debug my issues
  from my
  last
  email and noticed the list of blacklisted IDs.  They appear to be
  numbers
  with
  special meaning.  I was curious about two things, why are they
  blacklisted
  and
  what is the meaning of the remaining number?
 
  * 1: I imagine that this just refers to #1
  * 23: Probably refers to the 23 enigma
  * 42: Life the universe and everything
  * 1337: leet
  * 9001: ISO 9001, which deals with quality assurance
  * 31337: Elite
 
 
  I guess we probably ought to delete those default values. They where
  added
  for something easter-egg like in the Wikidata project, and might well
  get in
  the way for third party users. This is also not the list of actual IDs
  that
  got blacklisted on Wikidata.org, which was a bit more extensive, and
  for
  instance had Q2013, the year in which Wikidata launched. I submitted a
  removal of these blacklisted IDs from the default config in
  https://gerrit.wikimedia.org/r/#/c/172504/
 
  The only number that left me lost was 720101010. I couldn't figure
  this
  one out.
 
 
  720101010 is 1337 for trolololo :)
 
  Cheers
 
  --
  Jeroen De Dauw - http://www.bn2vs.com
  Software craftsmanship advocate
  Evil software architect at Wikimedia Germany
  ~=[,,_,,]:3
 
  ___
  Wikidata-l mailing list
  Wikidata-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikidata-l
 

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l


 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Various questions

2014-11-11 Thread John Erling Blad
What did I say, etc, etc, etc... It feels good to be right. I was
right. Me. I and myself.
Some stuff always bites you, even if it was quite fun! ;)

On Tue, Nov 11, 2014 at 9:09 AM, Jeroen De Dauw jeroended...@gmail.com wrote:
 Hey,

 I was looking through the configuration trying to debug my issues from my
 last
 email and noticed the list of blacklisted IDs.  They appear to be numbers
 with
 special meaning.  I was curious about two things, why are they blacklisted
 and
 what is the meaning of the remaining number?

 * 1: I imagine that this just refers to #1
 * 23: Probably refers to the 23 enigma
 * 42: Life the universe and everything
 * 1337: leet
 * 9001: ISO 9001, which deals with quality assurance
 * 31337: Elite


 I guess we probably ought to delete those default values. They where added
 for something easter-egg like in the Wikidata project, and might well get in
 the way for third party users. This is also not the list of actual IDs that
 got blacklisted on Wikidata.org, which was a bit more extensive, and for
 instance had Q2013, the year in which Wikidata launched. I submitted a
 removal of these blacklisted IDs from the default config in
 https://gerrit.wikimedia.org/r/#/c/172504/

 The only number that left me lost was 720101010. I couldn't figure this
 one out.


 720101010 is 1337 for trolololo :)

 Cheers

 --
 Jeroen De Dauw - http://www.bn2vs.com
 Software craftsmanship advocate
 Evil software architect at Wikimedia Germany
 ~=[,,_,,]:3

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata RDF

2014-10-28 Thread John Erling Blad
The data model is close to RDF, but not quite. Statements in items are
reified statements, etc. Technically it is semantic data, where RDF is
one possible representaton.

There was a decision choice to keep Mediawiki to ease reuse within the
Wikimedia sites, mostly so users could reuse their knowledge, but also
for devs to reuse existing infrastructure.

Some of the problems with Wd comes from the fact that the similarities
isn't clear enough for the users, and possibly the devs, which have
resulted in a slightly introvert community and a technical structure
that is slightly more Wikipedia-centric than necessary.

On Tue, Oct 28, 2014 at 6:48 AM, Gerard Meijssen
gerard.meijs...@gmail.com wrote:
 Hoi,
 Hell no. Wikidata is first and foremost a product that is actually used. It
 has that way from the start. Prioritising RDF over actual practical use
 cases is imho wrong. If anything the continuous tinkering on the format of
 dumps has mostly brought us grieve. Dumps that can no longer be read like
 currently for the Wikidata statistics really hurt.

 So lets not spend time at this time on RDF, Lets ensure that what we have
 works, works well and plan carefully for a better RDF but lets only have it
 go in production AFTER we know that it works well.
 Thanks,
   GerardM

 On 28 October 2014 02:46, Martynas Jusevičius marty...@graphity.org wrote:

 Hey all,

 so I see there is some work being done on mapping Wikidata data model
 to RDF [1].

 Just a thought: what if you actually used RDF and Wikidata's concepts
 modeled in it right from the start? And used standard RDF tools, APIs,
 query language (SPARQL) instead of building the whole thing from
 scratch?

 Is it just me or was this decision really a colossal waste of resources?


 [1] http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf

 Martynas
 http://graphityhq.com

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l



 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] How are queries doing?

2014-01-07 Thread John Erling Blad
I agree with Dennys decision that Queries should be deprioritized
compared to other more urgent needs. A fully working data model is
obviously more important than a query system utilizing that data
model.

I have stuff I really had hoped that the team prioritized a bit
higher, but I also know that they must prioritze very hard to make a
working system. In some cases that means I must wait  while other
stuff gets implemented.

The Wikibase extension is a big and complex piece of software, and it
will take time to complete it. The only way to speed up development is
to get more people involved in the development, either as volunteers
or as employees. Not everyone can be employed, but a lot of people can
volunteer.

/jeblad


On Tue, Jan 7, 2014 at 11:28 PM, Denny Vrandečić vrande...@gmail.com wrote:
 The main reason why Queries are not done yet is because in the beginning of
 2013 I deprioritized them compared to the original plan. Only a single
 developer kept working on them, instead of a major part of the team, as was
 originally planned.

 I made this decision because it became clear to me that we will likely be
 able to continue the Wikidata development beyond the original 12-month plan
 (as was indeed the case) and that, in the medium run, rushing this
 functionality would only hurt the project. I thus decided to increase the
 priorities on tasks which had a higher short-term benefit and were more
 immediate, e.g. many smaller things, but also more datatypes, ranks, and
 clean-ups, but also reactions to the roll-outs which had begun back then.
 This made us highly responsive to the current needs of the community, and
 lead to a sustained growth of Wikidata.

 If it would be needed, queries could be rushed. But that would have a
 negative impact on the longer sustainability of the project. If it would be
 deemed a higher priority, the development of queries could be sped up. But
 this comes with a sacrifice regarding other functionalities. Thus yes, more
 resources would lead to a faster development of queries (if it were decided
 that this would be the appropriate priority).

 The latter especially means that a sustained contribution from external
 developers can also lead to a faster development of the query functionality.
 We have seen with the sustained support of Benestar for the Badges
 functionality that this is feasible and possible. So instead of simply
 expressing complaints about features not being developed fast enough, how
 about actually helping with making them real? it is Open Source after all.
 Or at least simply make a case for the importance of this functionality? The
 development team keeps listening to the community like no other that I know
 of, and prioritizes their effort with respect to that.

 So, in short, blame me.

 Cheers,
 Denny




 On Tue, Jan 7, 2014 at 2:08 PM, Jan Kučera kozuc...@gmail.com wrote:

 Hm,

 nice to read all the reasoning why queries are yet still not possible, but
 I think we live in 2014 not and not 1914 actually... seems like the problem
 is too small budget or bad management... can not really think of another
 reason. How much do you think would it cost to make queries reality for
 production at Wikidata?

 Regards,
 Jan



 2013/11/29 Gerard Meijssen gerard.meijs...@gmail.com

 Hoi,
 Please understand that providing functionality like query is something
 that has to fit into a continuously live environment. This is an environment
 where the Wikidata functionality is used all the time and where some of the
 underlying functionality is changed as well. The Wikidata development is not
 happening in a vacuum.

 Given that we hope to get next week a new type of property, it should be
 obvious that Wikidata is not feature complete. When you add on extra
 functionality like a query engine, you add extra complications while the
 work is ongoing to get to the stage where Wikidata is feature complete for
 the data types.

 Another aspect is that it is NOT the Wikidata team to decide what goes
 into production on Wikipedia projects. The Ask 1.0 functionality for
 instance is at its release level. It is now for other people to determine if
 they want to include it in. They have their own road maps and, it is not
 obvious for an observer what the rationales are. NB Ask 1.0 is also used in
 Semantic MediaWiki and it provides a query kind of functionality. Query does
 require some performance grin and what is too much /grin.

 So in one aspect there is a query functionality to be used in Wikipedia
 ea. What the query functionality will deliver that is still being build is
 not clear to me.

 On another note, there are other projects that have lingering before they
 were implemented. Nothing new here. There have been other projects that had
 to change because of external pressures. Nothing new here.

 If you want query functionality on the existing data now, there is a hack
 that works quite nicely. It makes use of data replicated to the labs
 environment. The 

[Wikidata-l] WpIAF and WdIAF?

2013-09-27 Thread John Erling Blad
It seems like some GLAM institutions are starting to use URLs to
Wikipedia articles as common identifiers for classes of items in
collections. I guess this will change over time to be  URLs to
Wikidata instead. This is pretty much as expected, but it is although
a bit strange that they are using Wikipedia and that no-one seems to
have noticed this from our side.

Is there anyone that know how common this is? Så far I have only heard
about a few of them, and I'm not even sure they have any clear
understanding why they do it..

John / jeblad

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] LoCloud and Wikimedia

2013-08-26 Thread John Erling Blad
Have anyone heard about a LoCloud project in Europeana? Its about
integration of Wikimedia services and content in a cloud service for
local glam institutions. It seems like the project was started
officially in March 2013.

LoCloud is a cloud service for small and medium sized institutions. It
seems like WLM will be part of this through work package 3, and then
as a micro service (!) in a larger system based on SaaS.

John

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] LoCloud and Wikimedia

2013-08-26 Thread John Erling Blad
There is a discussion on the cultural-partners list from 26th March 2013.
John

On Mon, Aug 26, 2013 at 9:47 AM, John Erling Blad jeb...@gmail.com wrote:
 Have anyone heard about a LoCloud project in Europeana? Its about
 integration of Wikimedia services and content in a cloud service for
 local glam institutions. It seems like the project was started
 officially in March 2013.

 LoCloud is a cloud service for small and medium sized institutions. It
 seems like WLM will be part of this through work package 3, and then
 as a micro service (!) in a larger system based on SaaS.

 John

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] The Day the Knowledge Graph Exploded

2013-08-23 Thread John Erling Blad
Oh I guess he has an idea, but he can't prove it! ;)

On Fri, Aug 23, 2013 at 4:21 PM, Denny Vrandečić
denny.vrande...@wikimedia.de wrote:
 Oh, that's a clear and loud I have no idea :)


 2013/8/23 Tom Morris tfmor...@gmail.com

 On Fri, Aug 23, 2013 at 10:10 AM, Denny Vrandečić
 denny.vrande...@wikimedia.de wrote:


 I understand Michael's question to be much more concrete: does the
 progress in Wikidata has anything to do with the changes in the Knowledge
 Graph's visibility in Google's searches that happened last month?


 So, what's your opinion?

 Tom

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




 --
 Project director Wikidata
 Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
 Tel. +49-30-219 158 26-0 | http://wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
 der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
 Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Wikidata language codes

2013-08-11 Thread John Erling Blad
You can't use no as a language in ULS, but you can use setlang and
uselang with no if I remember correct. All messages are aliased to
nb if the language is nb. Also at nowiki will the messages for
nb be used, and this is an accepted solution. Previously
no.wikidata.org redirected with a setlang=no and that created a lot of
confusion as we then had to different language codes depending on how
the page was opened. There are also bots that use the site id to
generate a language code and that will create a no language code.

On 8/10/13, Markus Krötzsch mar...@semantic-mediawiki.org wrote:
 On 10/08/13 11:07, John Erling Blad wrote:
 The language code no is the metacode for Norwegian, and nowiki was
 in the beginning used for both Norwegian Bokmål, Riksmål and Nynorsk.
 The later split of and made nnwiki, but nowiki continued as before.
 After a while all Nynorsk content was migrated. Now nowiki has content
 in Bokmål and Riksmål, first one is official in Norway and the later
 is an unofficial variant. After the last additions to Bokmål there are
 very few forms that are only legal n Riksmål, so for all practical
 purposes nowiki has become a pure Bokmål wiki.

 I think all content in Wikidata should use either nn or nb, and
 all existing content with no as language code should be folded into
 nb. It would be nice if no could be used as an alias for nb, as
 this is de facto situation now, but it is probably not necessary and
 could create a discussion with the Nynorsk community.

 The site code should be nowiki as long as the community does not ask
 for a change.

 Thanks for the clarification. I will keep no to mean no for now.

 What I wonder is: if users choose to enter a no label on Wikidata,
 what is the language setting that they see? Does this say Norwegian
 (any variant) or what? That's what puzzles me. I know that a Wikipedia
 can allow multiple languages (or dialects) to coexist, but in the
 Wikidata language selector I thought you can only select real
 languages, not language groups.

 Markus



 On 8/6/13, Markus Krötzsch mar...@semantic-mediawiki.org wrote:
 Hi Purodha,

 thanks for the helpful hints. I have implemented most of these now in
 the list on git (this is also where you can see the private codes I have
 created where needed). I don't see a big problem in changing the codes
 in future exports if better options become available (it's much easier
 than changing codes used internally).

 One open question that I still have is what it means if a language that
 usually has a script tag appears without such a tag (zh vs.
 zh-Hans/zh-Hant or sr vs. sr-Cyrl/sr-Latn). Does this really mean that
 we do not know which script is used under this code (either could
 appear)?

 The other question is about the duplicate language tags, such as 'crh'
 and 'crh-Latn', which both appear in the data but are mapped to the same
 code. Maybe one of the codes is just phased out and will disappear over
 time? I guess the Wikidata team needs to answer this. We also have some
 codes that mean the same according to IANA, namely kk and kk-Cyrl, but
 which are currently not mapped to the same canonical IANA code.

 Finally, I wondered about Norwegian. I gather that no.wikipedia.org is
 in Norwegian Bokmål (nb), which is how I map the site now. However, the
 language data in the dumps (not the site data) uses both no and nb.
 Moreover, many items have different texts for nb and no. I wonder if
 both are still Bokmål, and there is just a bug that allows people to
 enter texts for nb under two language settings (for descriptions this
 could easily be a different text, even if in the same language). We also
 have nn, and I did not check how this relates to no (same text or
 different?).

 Cheers,
 Markus

 On 05/08/13 15:41, P. Blissenbach wrote:
 Hi Markus,
 Our code 'sr-ec' is at this moment effectively equivalent to 'sr-Cyrl',
 likewise
 is our code 'sr-el' currently effectively equivalent to 'sr-Latn'. Both
 might change,
 once dialect codes of Serbian are added to the IANA subtag registry at
 http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
 Our code 'nrm' is not being used for the Narom language as ISO 639-3
 does, see:
 http://www-01.sil.org/iso639-3/documentation.asp?id=nrm
 We rather use it for the Norman / Nourmaud, as described in
 http://en.wikipedia.org/wiki/Norman_language
 The Norman language is recognized by the linguist list and many others
 but as of
 yet not present in ISO 639-3. It should probably be suggested to be
 added.
 We should probaly map it to a private code meanwhile.
 Our code 'ksh' is currently being used to represent a superset of what
 it stands for
 in ISO 639-3. Since ISO 639 lacks a group code for Ripuarian, we use
 the
 code of the
 only Ripuarian variety (of dozens) having a code, to represent the
 whole
 lot. We
 should probably suggest to add a group code to ISO 639, and at least
 the
 dozen+
 Ripuarian languages that we are using, and map 'ksh

Re: [Wikidata-l] Wikidata language codes

2013-08-10 Thread John Erling Blad
The language code no is the metacode for Norwegian, and nowiki was
in the beginning used for both Norwegian Bokmål, Riksmål and Nynorsk.
The later split of and made nnwiki, but nowiki continued as before.
After a while all Nynorsk content was migrated. Now nowiki has content
in Bokmål and Riksmål, first one is official in Norway and the later
is an unofficial variant. After the last additions to Bokmål there are
very few forms that are only legal n Riksmål, so for all practical
purposes nowiki has become a pure Bokmål wiki.

I think all content in Wikidata should use either nn or nb, and
all existing content with no as language code should be folded into
nb. It would be nice if no could be used as an alias for nb, as
this is de facto situation now, but it is probably not necessary and
could create a discussion with the Nynorsk community.

The site code should be nowiki as long as the community does not ask
for a change.

jeblad

On 8/6/13, Markus Krötzsch mar...@semantic-mediawiki.org wrote:
 Hi Purodha,

 thanks for the helpful hints. I have implemented most of these now in
 the list on git (this is also where you can see the private codes I have
 created where needed). I don't see a big problem in changing the codes
 in future exports if better options become available (it's much easier
 than changing codes used internally).

 One open question that I still have is what it means if a language that
 usually has a script tag appears without such a tag (zh vs.
 zh-Hans/zh-Hant or sr vs. sr-Cyrl/sr-Latn). Does this really mean that
 we do not know which script is used under this code (either could appear)?

 The other question is about the duplicate language tags, such as 'crh'
 and 'crh-Latn', which both appear in the data but are mapped to the same
 code. Maybe one of the codes is just phased out and will disappear over
 time? I guess the Wikidata team needs to answer this. We also have some
 codes that mean the same according to IANA, namely kk and kk-Cyrl, but
 which are currently not mapped to the same canonical IANA code.

 Finally, I wondered about Norwegian. I gather that no.wikipedia.org is
 in Norwegian Bokmål (nb), which is how I map the site now. However, the
 language data in the dumps (not the site data) uses both no and nb.
 Moreover, many items have different texts for nb and no. I wonder if
 both are still Bokmål, and there is just a bug that allows people to
 enter texts for nb under two language settings (for descriptions this
 could easily be a different text, even if in the same language). We also
 have nn, and I did not check how this relates to no (same text or
 different?).

 Cheers,
 Markus

 On 05/08/13 15:41, P. Blissenbach wrote:
 Hi Markus,
 Our code 'sr-ec' is at this moment effectively equivalent to 'sr-Cyrl',
 likewise
 is our code 'sr-el' currently effectively equivalent to 'sr-Latn'. Both
 might change,
 once dialect codes of Serbian are added to the IANA subtag registry at
 http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
 Our code 'nrm' is not being used for the Narom language as ISO 639-3
 does, see:
 http://www-01.sil.org/iso639-3/documentation.asp?id=nrm
 We rather use it for the Norman / Nourmaud, as described in
 http://en.wikipedia.org/wiki/Norman_language
 The Norman language is recognized by the linguist list and many others
 but as of
 yet not present in ISO 639-3. It should probably be suggested to be added.
 We should probaly map it to a private code meanwhile.
 Our code 'ksh' is currently being used to represent a superset of what
 it stands for
 in ISO 639-3. Since ISO 639 lacks a group code for Ripuarian, we use the
 code of the
 only Ripuarian variety (of dozens) having a code, to represent the whole
 lot. We
 should probably suggest to add a group code to ISO 639, and at least the
 dozen+
 Ripuarian languages that we are using, and map 'ksh' to a private code
 for Ripuarian
 meanwhile.
 Note also, that for the ALS/GSW and the KSH Wikipedias, page titles are
 not
 guaranteed to be in the languages of the Wikipedias. They are often in
 German
 instead. Details to be found in their respective page titleing rules.
 Moreover,
 for the ksh Wikipedia, unlike some other multilingual or multidialectal
 Wikipedias,
 texts are not, or quite often incorrectly, labelled as belonging to a
 certain dialect.
 See also: http://meta.wikimedia.org/wiki/Special_language_codes
 Greetings -- Purodha
 *Gesendet:* Sonntag, 04. August 2013 um 19:01 Uhr
 *Von:* Markus Krötzsch mar...@semantic-mediawiki.org
 *An:* Federico Leva (Nemo) nemow...@gmail.com
 *Cc:* Discussion list for the Wikidata project.
 wikidata-l@lists.wikimedia.org
 *Betreff:* [Wikidata-l] Wikidata language codes (Was: Wikidata RDF
 export available)
 Small update: I went through the language list at

 https://github.com/mkroetzsch/wda/blob/master/includes/epTurtleFileWriter.py#L472

 and added a number of TODOs to the most obvious problematic cases.
 Typical problems are:

 * 

[Wikidata-l] Changes to the API for Wikidata

2012-09-05 Thread John Erling Blad
Please note that this is a breaking change for bots!

It is decided that the module wbsetitem will change from the present
short form in the json structure to a long form. Exactly how the
long form will be is still a bit open, but it will be closer to the
json output format. The changes also makes it possible to use a
key-less format as the language and site id will be available inside
the long form.

The following short form will be WRONG in the future
{
  labels:{
de:Foo,
en:Bar
  }
}

This long form will be RIGHT in the future
{
  labels:{
de:{value:Foo,language:de},
en:{value:Bar,language:en}
  }
}

And also this will be RIGHT
{
  labels:[
{value:Foo,language:de},
{value:Bar,language:en}
  ]
}

New modules may use a long form if they support json input, but that
for the future.

In some cases it seems necessary to add flags for what to do with
individual fields, but perhaps we can avoid it. (Typically for
adding/removing aliases, but it could also be necessary for other
fields.)

There are also a new clear URL argument that will clear the content
of an item and make it possible to rebuild it from scratch. Default
behavior is NOT to clear the content, but to incrementally build on
the existing item.

Also note that use of [no]usekeys in the URL is not supported anymore.

See also
* http://www.mediawiki.org/wiki/Extension:Wikibase/API#New_long_format

/jeblad

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


  1   2   >