from:"Marco Neumann"

[Wikidata] Re: units

2023-01-24 Thread Marco Neumann

Enjoy

Best,
Marco

On Tue, Jan 24, 2023 at 11:30 PM Olaf Simons 
wrote:

> ...alles was ich machte, war mal wieder 30 mal komplizierter,
>
> vielen Dank!
> Olaf
>
>
> > Marco Neumann  hat am 24.01.2023 23:48 CET
> geschrieben:
> >
> >
> > https://tinyurl.com/2nbqnavq
> > ___
> > Wikidata mailing list -- wikidata@lists.wikimedia.org
> > Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/NW3SGOW6LHX3XAYMMWJZ72LDAWSJ73MU/
> > To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
> Dr. Olaf Simons
> Forschungszentrum Gotha der Universität Erfurt
> Am Schlossberg 2
> 99867 Gotha
> Büro: +49-361-737-1722
> Mobil: +49-179-5196880
> Privat: Hauptmarkt 17b/ 99867 Gotha
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/KXGWRI3N3FYLTLSOGJ3FJ2JVKWVOHV52/
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>


-- 


---
Marco Neumann
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/BSJQRJOJBG3ON747K2GNMHFSEPILOXBH/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

[Wikidata] Re: units

2023-01-24 Thread Marco Neumann

https://tinyurl.com/2nbqnavq
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/NW3SGOW6LHX3XAYMMWJZ72LDAWSJ73MU/
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

[Wikidata-tech] Re: Fehlerhafte Anzeige

2023-01-13 Thread Marco Neumann

ups...

"Wien Geschichte Wiki" und die neue und korrekte ID mit der Q Form von
Peter Kreuden (Q76660) verlinken

https://www.geschichtewiki.wien.gv.at/Spezial:Mit_Formular_bearbeiten/Person/Peter_Kreuder

On Fri, Jan 13, 2023 at 8:43 AM Marco Neumann 
wrote:

> Danke für Deinen Beitrag Renate. Es gibt jetzt mindestens zwei
> Möglichkeiten: a.) Statement in wikidata Q Form löschen b.) Seite anlegen im
>
>
>
> On Fri, Jan 13, 2023 at 8:24 AM Renate Schiebel 
> wrote:
>
>> Sehr geehrte Damen und Herren,
>>
>> ich recherchiere für die Universität für Musik und darstellende Kunst
>> Wien diverse Namen von Musikern, Komponisten, Schauspieler usw. auf Ihrer
>> Seite.
>> Bei einer Recherche fiel mir auf, dass bei dem Komponisten Peter Kreuder
>> (Q76660) https://www.wikidata.org/wiki/Q76660 der Link zu der Vienna
>> History Wiki ID 38214
>> <https://www.geschichtewiki.wien.gv.at/Special:URIResolver/?curid=38214> 
>> falsch
>> verlingt wurde. Es wird hier die Seite des Richard Franz Kreutel angezeigt.
>>
>> Ich denke es ist im allgemeinen Interesse, wenn Fehler bemerkt und
>> gemeldet werden.
>>
>> Mit freundlichen Grüßen
>> Renate Schiebel
>>
>>
>>
>> Universität für Musik und darstellende Kunst, Wien
>> Archiv
>> Tel: 0171155-6512
>> Anton-von-Webern Platz 1/I
>> schiebel-ren...@mdw.ac.at
>>
>> _______
>> Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org
>> To unsubscribe send an email to wikidata-tech-le...@lists.wikimedia.org
>>
>
>
> --
>
>
> ---
> Marco Neumann
>
>
>

-- 


---
Marco Neumann
___
Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org
To unsubscribe send an email to wikidata-tech-le...@lists.wikimedia.org

[Wikidata-tech] Re: Fehlerhafte Anzeige

2023-01-13 Thread Marco Neumann

Danke für Deinen Beitrag Renate. Es gibt jetzt mindestens zwei
Möglichkeiten: a.) Statement in wikidata Q Form löschen b.) Seite anlegen im



On Fri, Jan 13, 2023 at 8:24 AM Renate Schiebel 
wrote:

> Sehr geehrte Damen und Herren,
>
> ich recherchiere für die Universität für Musik und darstellende Kunst Wien
> diverse Namen von Musikern, Komponisten, Schauspieler usw. auf Ihrer Seite.
> Bei einer Recherche fiel mir auf, dass bei dem Komponisten Peter Kreuder
> (Q76660) https://www.wikidata.org/wiki/Q76660 der Link zu der Vienna
> History Wiki ID 38214
> <https://www.geschichtewiki.wien.gv.at/Special:URIResolver/?curid=38214> 
> falsch
> verlingt wurde. Es wird hier die Seite des Richard Franz Kreutel angezeigt.
>
> Ich denke es ist im allgemeinen Interesse, wenn Fehler bemerkt und
> gemeldet werden.
>
> Mit freundlichen Grüßen
> Renate Schiebel
>
>
>
> Universität für Musik und darstellende Kunst, Wien
> Archiv
> Tel: 0171155-6512
> Anton-von-Webern Platz 1/I
> schiebel-ren...@mdw.ac.at
>
> ___
> Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org
> To unsubscribe send an email to wikidata-tech-le...@lists.wikimedia.org
>


-- 


---
Marco Neumann
___
Wikidata-tech mailing list -- wikidata-tech@lists.wikimedia.org
To unsubscribe send an email to wikidata-tech-le...@lists.wikimedia.org

[Wikidata] Re: WDQS State of the Union, Dec 2021

2021-12-22 Thread Marco Neumann

[image: image.png]


On Wed, Dec 22, 2021 at 5:56 PM Mike Pham  wrote:

> Hi all,
>
> I’m trying to make sure we keep everyone updated with WDQS progress. Here
> is the latest State of the Union
> <http://Wikidata:SPARQL_query_service/WDQS-State-of-the-union-2021-Dec>.
>
> Have a great end of 2021, and we’ll see you next year!
>
> Mike
>
>
>
>
> —
>
> *Mike Pham* (he/him)
> Sr Product Manager, Search
> Wikimedia Foundation <https://wikimediafoundation.org/>
>
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

[Wikidata] Re: Blazegraph failure playbook

2021-12-10 Thread Marco Neumann

Ok, just looking at the FAQ Mike. looks like federation is not part of your
playbook at this point in time.

Though I personally would like to see more federation at wikidata.

Marco

On Fri, Dec 10, 2021 at 11:46 AM Marco Neumann 
wrote:

> Hi Mike,
>
> maybe change the word "delete" to "move" to make it sound less contentious.
>
> These so called subgraphes can easily be hosted elsewhere within the realm
> of the Wikimedia Foundation domain or externally. And still be linked to
> and be queryable by WDQS on alternative backends.
>
> Marco
>
> On Fri, Dec 10, 2021 at 9:59 AM Mike Pham  wrote:
>
>> Hi all,
>>
>> As many of you know, there is a risk of Blazegraph, Wikidata Query
>> Service’s backend, catastrophically failing before we are able to properly
>> scale the service.
>>
>> Thus, it is important for us to have a playbook for what actions we will
>> take in the event of such a disaster, as well as making that playbook
>> transparently available.
>>
>> This Blazegraph failure playbook
>> <https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Blazegraph_failure_playbook>
>> is now available, and we welcome you to take a look at it, though we hope
>> to never use it.
>>
>> Best,
>> Mike
>>
>>
>>
>>
>> —
>>
>> *Mike Pham* (he/him)
>> Sr Product Manager, Search
>> Wikimedia Foundation <https://wikimediafoundation.org/>
>>
>> _______
>> Wikidata mailing list -- wikidata@lists.wikimedia.org
>> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>>
>
>
> --
>
>
> ---
> Marco Neumann
> KONA
>
>

-- 


---
Marco Neumann
KONA
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

[Wikidata] Re: Blazegraph failure playbook

2021-12-10 Thread Marco Neumann

Hi Mike,

maybe change the word "delete" to "move" to make it sound less contentious.

These so called subgraphes can easily be hosted elsewhere within the realm
of the Wikimedia Foundation domain or externally. And still be linked to
and be queryable by WDQS on alternative backends.

Marco

On Fri, Dec 10, 2021 at 9:59 AM Mike Pham  wrote:

> Hi all,
>
> As many of you know, there is a risk of Blazegraph, Wikidata Query
> Service’s backend, catastrophically failing before we are able to properly
> scale the service.
>
> Thus, it is important for us to have a playbook for what actions we will
> take in the event of such a disaster, as well as making that playbook
> transparently available.
>
> This Blazegraph failure playbook
> <https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Blazegraph_failure_playbook>
> is now available, and we welcome you to take a look at it, though we hope
> to never use it.
>
> Best,
> Mike
>
>
>
>
> —
>
> *Mike Pham* (he/him)
> Sr Product Manager, Search
> Wikimedia Foundation <https://wikimediafoundation.org/>
>
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org

Re: [Wikidata] Status of Wikidata Query Service

2020-02-10 Thread Marco Neumann

f reads on WDQS) will help. But not using those
> services makes them useless.
>
> What about making the lag part of the service.  I mean, you could
> reload WDQS periodically, for instance daily, and drop the updater
> altogether. Who needs to see the updates live in WDQS as soon as edits
> are done in wikidata?
>
> > We suspect that some use cases are more expensive than others (a single
> property change to a large entity will require a comparatively insane
> amount of work to update it on the WDQS side). We'd like to have real data
> on the cost of various operations, but we only have guesses at this point.
> >
> > If you've read this far, thanks a lot for your engagement!
> >
> >   Have fun!
> >
>
> Will do.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Status of Wikidata Query Service

2020-02-10 Thread Marco Neumann

I accept your apology Guillaume, no worries.

Regards,
Marco

On Mon, Feb 10, 2020 at 2:37 PM Guillaume Lederrey 
wrote:

> On Fri, Feb 7, 2020 at 5:18 PM Guillaume Lederrey 
> wrote:
>
>> On Fri, Feb 7, 2020 at 2:54 PM Marco Neumann 
>> wrote:
>>
>>> thank you Guillaume, when do you expect a public update on the security
>>> incident [1]? Is any of our personal and private data (email, password etc)
>>> affected?
>>>
>>
>> It should be made public in the next few days. I'm not going to go into
>> any more details until this is made public, but overall, don't worry too
>> much.
>>
>
> Corrections and apologies on what I said above. We are not actually ready
> to make this ticket public. The underlying issue is under control and does
> not require any user action to mitigate. Given the security aspect, I'm not
> going to do any further communication on this.
>
> Sorry to have been misleading on this.
>
>   Enjoy your day!
>
>  Guillaume
>
>
>> best,
>>> Marco
>>>
>>> [1] https://phabricator.wikimedia.org/T241410
>>>
>>> On Fri, Feb 7, 2020 at 1:33 PM Guillaume Lederrey <
>>> gleder...@wikimedia.org> wrote:
>>>
>>>> Hello all!
>>>>
>>>> First of all, my apologies for the long silence. We need to do better
>>>> in terms of communication. I'll try my best to send a monthly update from
>>>> now on. Keep me honest, remind me if I fail.
>>>>
>>>> First, we had a security incident at the end of December, which forced
>>>> us to move from our Kafka based update stream back to the RecentChanges
>>>> poller. The details are still private, but you will be able to get the full
>>>> story soon on phabricator [1]. The RecentChange poller is less efficient
>>>> and this is leading to high update lag again (just when we thought we had
>>>> things slightly under control). We tried to mitigate this by improving the
>>>> parallelism in the updater [2], which helped a bit, but not as much as we
>>>> need.
>>>>
>>>> Another attempt to get update lag under control is to apply back
>>>> pressure on edits, by adding the WDQS update lag to the Wikdiata maxlag
>>>> [6]. This is obviously less than ideal (at least as long as WDQS updates
>>>> are lagging as often as they are), but does allow the service to recover
>>>> from time to time. We probably need to iterate on this, provide better
>>>> granularity, differentiate better between operations that have an impact on
>>>> update lag and those which don't.
>>>>
>>>> On the slightly better news side, we now have a much better
>>>> understanding of the update process and of its shortcomings. The current
>>>> process does a full diff between each updated entity and what we have in
>>>> blazegraph. Even if a single triple needs to change, we still read tons of
>>>> data from Blazegraph. While this approach is simple and robust, it is
>>>> obviously not efficient. We need to rewrite the updater to take a more
>>>> event streaming / reactive approach, and only work on the actual changes.
>>>> This is a big chunk of work, almost a complete rewrite of the updater, and
>>>> we need a new solution to stream changes with guaranteed ordering
>>>> (something that our kafka queues don't offer). This is where we are
>>>> focusing our energy at the moment, this looks like the best option to
>>>> improve the situation in the medium term. This change will probably have
>>>> some functional impacts [3].
>>>>
>>>> Some misc things:
>>>>
>>>> We have done some work to get better metrics and better understanding
>>>> of what's going on. From collecting more metrics during the update [4] to
>>>> loading RDF dumps into Hadoop for further analysis [5] and better logging
>>>> of SPARQL requests. We are not focusing on this analysis until we are in a
>>>> more stable situation regarding update lag.
>>>>
>>>> We have a new team member working on WDQS. He is still ramping up, but
>>>> we should have a bit more capacity from now on.
>>>>
>>>> Some longer term thoughts:
>>>>
>>>> Keeping all of Wikidata in a single graph is most probably not going to
>>>> work long term. We have not found examples of public SPARQL endpoints with
>>>> > 10 B triples and there is probably a goo

Re: [Wikidata] Status of Wikidata Query Service

2020-02-07 Thread Marco Neumann

ator.wikimedia.org/T221774
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+1 / CET
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-22 Thread Marco Neumann

Thibaut, while it's certainly exciting to see continued work on the
development of storage solutions and hybrids are most likely part of the
future story here I'd also would to stay as close as possible to existing
Semantic Web / Linked Data standards like RDF and SPARQL to
guarantee interop and extensibility.

no matter what mix of underlying tech is being deployed here under the hood.

On Fri, Jun 21, 2019 at 11:56 PM Thibaut DEVERAUX <
thibaut.dever...@gmail.com> wrote:

> Dear,
>
> I've seen this suggestion on Quora :
>
> https://www.quora.com/Wouldnt-a-mix-database-system-that-handle-both-JSON-documents-and-graph-functions-like-ArangoDB-provide-a-better-scalability-to-enormous-knowledge-graphs-like-Wikidata-than-a-classical-quadstore
>
>
>  I'm not qualified enough to know if it is relevant but this could be some
> brainstorming.
>
> Regards
>
>
>
> Le mer. 19 juin 2019 à 19:45, Finn Aarup Nielsen  a écrit :
>
>>
>> Changing the subject a bit:
>>
>> I am surprised to see how many SPARQL requests go to the endpoint when
>> performing a ShEx validation with the shex-simple Toolforge tool. They
>> are all very simple and quickly complete. For each Wikidata item tested,
>> one of our tests [1] requests tens of times. That is, testing 100
>> Wikidata items may yield thousands of requests to the endpoint in rapid
>> succession.
>>
>> I suppose that given the simple SPARQL queries, these kinds of requests
>> might not load WDQS very much.
>>
>>
>> [1]
>>
>> https://tools.wmflabs.org/shex-simple/wikidata/packages/shex-webapp/doc/shex-simple.html?data=Endpoint:%20https://query.wikidata.org/sparql=[]=%2F%2Fwww.wikidata.org%2Fwiki%2FSpecial%3AEntitySchemaText%2FE65
>>
>>
>> Finn
>> http://people.compute.dtu.dk/faan/
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Scaling Wikidata Query Service

2019-06-11 Thread Marco Neumann

and of course not to forget the fully open source  SPARQL 1.1 compliant RDF
database Apache Jena with TDB. Did you already evaluate Apache Jena for use
in wikidata?



On Tue, Jun 11, 2019 at 5:07 PM Andra Waagmeester  wrote:

>
>
> On Tue, Jun 11, 2019 at 11:23 AM Jerven Bolleman et al wrote:
>
>>
>> >>  So we are playing the game since ten years now: Everybody tries other
>> databases, but then most people come back to virtuoso.
>>
>
> Nothing bad about virtuoso, on the contrary, they are a prime
> infrastructure provider (Except maybe their trademark SPARQL query: "select
> distinct ?Concept where {[] a ?Concept}" ;). But I personally think that
> replacing the current WDS with virtuoso would be a bad idea. Not from a
> performance perspective, but more from the signal it gives. If indeed as
> you state virtuoso is the only viable solution in the field, this field is
> nothing more than a niche. We really need more competition to get things
> done.
> Since both DBpedia and UniProt are indeed already running on Virtuoso -
> where it is doing a prime job -, having Wikidata running on another
> vendor's infrastructure does provide us with the so needed benchmark. The
> benchmark seems to be telling some of us already that there is room for
> other alternatives. So it is fulfilling its benchmarks role.
> Is there really no room for improvement with Blazegraph? How about graphDB?
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-04 Thread Marco Neumann

thanks  Guillaume. How does that compare to the wikidata footprint of the
wikidata service (SQL) not WDQS. I presume it sits in a MyISAM storage
container?

On Tue, Jun 4, 2019 at 11:25 AM Guillaume Lederrey 
wrote:

> On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez 
> wrote:
> >
> > Hello,
> >
> > Does somebody know the minimal hardware requirements (disk size and
> > RAM) for loading wikidata dump in Blazegraph?
>
> The actual hardware requirements will depend on your use case. But for
> comparison, our production servers are:
>
> * 16 cores (hyper threaded, 32 threads)
> * 128G RAM
> * 1.5T of SSD storage
>
> > The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
> > The bigdata.jnl file which stores all the triples data in Blazegraph
> > is 478G but still growing.
> > I had 1T disk but is almost full now.
>
> The current size of our jnl file in production is ~670G.
>
> Hope that helps!
>
> Guillaume
>
> > Thanks,
> >
> > Adam
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Are we ready for our future

2019-05-04 Thread Marco Neumann

maybe it would be a good idea to run sparql updates directly to the
endpoint. rather than taking the de-tour via SQL blobs here.

How large is the RDF TTL of the page?

On Sat, May 4, 2019 at 7:37 PM Stas Malyshev 
wrote:

> Hi!
>
> > WQS data doesn't have versions, it doesn't have to be in one space and
> > can easily be separated. The whole point of LOD is to decentralize your
> > data. But I understand that Wikidata/WQS is currently designend as a
> > centralized closed shop service for several reasons granted.
>
> True, WDQS does not have versions. But each time the edit is made, we
> now have to download and work through the whole 2M... It wasn't a
> problem when we were dealing with regular-sized entities, but current
> system certainly is not good for such giant ones.
>
> As for decentralizing, WDQS supports federation, but for obvious reasons
> federated queries are slower and less efficient. That said, if there
> were separate store for such kind of data, it might work as
> cross-querying against other Wikidata data wouldn't be very frequent.
> But this is something that Wikidata community needs to figure out how to
> do.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Are we ready for our future

2019-05-04 Thread Marco Neumann

yeah, the wikibase storage doesn't sound right here, but these are two
different issues, one with wikibase (sql) and one with the Wikidata Query
Service (blazegraph).

that 2M footprint is the sql db blob? each additional 2M edit is the
version history correct?

So the issue your are referring to here is in the design of the SQL based
"Wikibase Repository"? How does the 2M  footprint and its versions compare
to a large wikipedia blob?

WQS data doesn't have versions, it doesn't have to be in one space and can
easily be separated. The whole point of LOD is to decentralize your data.
But I understand that Wikidata/WQS is currently designend as a centralized
closed shop service for several reasons granted.




On Sat, May 4, 2019 at 8:57 AM Stas Malyshev 
wrote:

> Hi!
>
> > For the technical guys, consider our growth and plan for at least one
> > year. When the impression exists that the current architecture will not
> > scale beyond two years, start a project to future proof Wikidata.
>
> We may also want to consider if Wikidata is actually the best store for
> all kinds of data. Let's consider example:
>
> https://www.wikidata.org/w/index.php?title=Q57009452
>
> This is an entity that is almost 2M in size, almost 3000 statements and
> each edit to it produces another 2M data structure. And its dump, albeit
> slightly smaller, still 780K and will need to be updated on each edit.
>
> Our database is obviously not optimized for such entities, and they
> won't perform very well. We have 21 million scientific articles in the
> DB, and if even 2% of them would be like this, it's almost a terabyte of
> data (multiplied by number of revisions) and billions of statements.
>
> While I am not against storing this as such, I do wonder if it's
> sustainable to keep such kind of data together with other Wikidata data
> in a single database. After all, each query that you run - even if not
> related to that 21 million in any way - will have to still run in within
> the same enormous database and be hosted on the same hardware. This is
> especially important for services like Wikidata Query Service where all
> data (at least currently) occupies a shared space and can not be easily
> separated.
>
> Any thoughts on this?
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Are we ready for our future

2019-05-03 Thread Marco Neumann

looks like you are ready for the weekend Gerard :-) I don't see a scale
issue at the moment for the type of wikidata use cases I come across. Even
total number of triples is plateauing at 7.6bn*. ( of course it's easy to
write "bad" queries that bring down the server). Allowing people to setup
their own local instances with their own triple stores in the future is a
good approach for a distributed and decentralized data management approach
here.

that said a faster and better wikidata instance is always appreciated. And
can certainly be provided. What's the current cost of running/hosting the
service with wikibase+blazegraph per month?

Marco

*
https://grafana.wikimedia.org/d/00489/wikidata-query-service?refresh=1m=1=now-1y=now

On Fri, May 3, 2019 at 4:28 PM Gerard Meijssen 
wrote:

> Hoi,
> Lies, damned lies and statistics. The quality of Wikidata suffers, it
> could be so much better if we truly wanted Wikidata to grow. Your numbers
> only show growth within the limits of what has been made possible. Traffic
> and numbers could be much more.
> Thanks,
> GerardM
>
> On Fri, 3 May 2019 at 17:17, Marco Neumann 
> wrote:
>
>> Gerard, I like wikidata a lot, kudos to the community for keeping it
>> going. But keep it real, there is no exponential growth here.
>>
>> We are looking at a slow and sustainable growth at the moment with
>> possibly a plateauing of number of users and when it comes to total number
>> of wikidata items. just take a look at the statistics.
>>
>> Date | Content pages | Page edits since Wikidata was set up | Registered
>> users | Active users
>>
>> 4/2015  | 13,911,417  | 213,027,375 | 1,913,828 | 15,168
>> 5/2016  | 17,432,789  | 328,781,525 | 2,688,788 | 16,833
>> 7/2017  | 28,037,196  | 514,252,789 | 2,835,219 | 18,081
>> 7/2018  | 49,081,962  | 701,319,718 | 2,970,150 | 18,578
>> 4/2019  | 56,377,647  | 931,449,205 | 3,236,569 | 20,857
>>
>> When you refer to "growing like a weed". What's that page views? queries
>> per day? Mentions in the media?
>>
>> Best,
>> Marco
>>
>>
>>
>>
>> On Fri, May 3, 2019 at 3:36 PM Gerard Meijssen 
>> wrote:
>>
>>> Hoi,
>>> This mail thread is NOT about the issues that I or others face at this
>>> time. They are serious enough but that is not for this thread. People are
>>> working hard to find a solution for now.  That is cool.
>>>
>>> What I want to know is are we technically and financially ready for a
>>> continued exponential growth. If so, what are the plans and what if those
>>> plans are needed in half the time expected. Are we ready for a continued
>>> growth. When we hesitate we will lose the opportunities that are currently
>>> open to us.
>>> Thanks,
>>>GerardM
>>>
>>> On Fri, 3 May 2019 at 16:24, Thad Guidry  wrote:
>>>
>>>> Gerard mentioned the PROBLEM in the 2nd sentence.  I read it clearly
>>>>
>>>> >we all experience in the really bad response times we are suffering.
>>>> It is so bad that people are asked what kind of updates they are running
>>>> because it makes a difference in the lag times there are.
>>>>
>>>> The response times are typically attributed to SPARQL queries from what
>>>> I have seen, as well as applying multiple edits with scripts or mass
>>>> operations. Although I recall there is a light queue mechanism inherent in
>>>> the Blazegraph architecture that contributes to this, and I am fine with
>>>> slower writes.
>>>>
>>>> What most users are not comfortable with is the slower reads in
>>>> different areas of Wikidata.
>>>> We need to identify those slow read areas or figure out a way to get
>>>> consensus on what parts of Wikidata reading affect our users the most.
>>>>
>>>> So let's be constructive here:
>>>> Gerard - did you have specific areas that affect your daily work, and
>>>> what from of work is that (reading/writing , which areas) ?
>>>>
>>>> Thad
>>>> https://www.linkedin.com/in/thadguidry/
>>>> ___
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>
>>
>> --
>>
>>
>> ---
>> Marco Neumann
>> KONA
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Are we ready for our future

2019-05-03 Thread Marco Neumann

Gerard, I like wikidata a lot, kudos to the community for keeping it going.
But keep it real, there is no exponential growth here.

We are looking at a slow and sustainable growth at the moment with possibly
a plateauing of number of users and when it comes to total number of
wikidata items. just take a look at the statistics.

Date | Content pages | Page edits since Wikidata was set up | Registered
users | Active users

4/2015  | 13,911,417  | 213,027,375 | 1,913,828 | 15,168
5/2016  | 17,432,789  | 328,781,525 | 2,688,788 | 16,833
7/2017  | 28,037,196  | 514,252,789 | 2,835,219 | 18,081
7/2018  | 49,081,962  | 701,319,718 | 2,970,150 | 18,578
4/2019  | 56,377,647  | 931,449,205 | 3,236,569 | 20,857

When you refer to "growing like a weed". What's that page views? queries
per day? Mentions in the media?

Best,
Marco




On Fri, May 3, 2019 at 3:36 PM Gerard Meijssen 
wrote:

> Hoi,
> This mail thread is NOT about the issues that I or others face at this
> time. They are serious enough but that is not for this thread. People are
> working hard to find a solution for now.  That is cool.
>
> What I want to know is are we technically and financially ready for a
> continued exponential growth. If so, what are the plans and what if those
> plans are needed in half the time expected. Are we ready for a continued
> growth. When we hesitate we will lose the opportunities that are currently
> open to us.
> Thanks,
>GerardM
>
> On Fri, 3 May 2019 at 16:24, Thad Guidry  wrote:
>
>> Gerard mentioned the PROBLEM in the 2nd sentence.  I read it clearly
>>
>> >we all experience in the really bad response times we are suffering. It
>> is so bad that people are asked what kind of updates they are running
>> because it makes a difference in the lag times there are.
>>
>> The response times are typically attributed to SPARQL queries from what I
>> have seen, as well as applying multiple edits with scripts or mass
>> operations. Although I recall there is a light queue mechanism inherent in
>> the Blazegraph architecture that contributes to this, and I am fine with
>> slower writes.
>>
>> What most users are not comfortable with is the slower reads in different
>> areas of Wikidata.
>> We need to identify those slow read areas or figure out a way to get
>> consensus on what parts of Wikidata reading affect our users the most.
>>
>> So let's be constructive here:
>> Gerard - did you have specific areas that affect your daily work, and
>> what from of work is that (reading/writing , which areas) ?
>>
>> Thad
>> https://www.linkedin.com/in/thadguidry/
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Election data

2018-03-11 Thread Marco Neumann

we had a go at this during the wikidatacon 2017 in Berlin.

my take away from this session is that there is indeed an emerging pattern
for representing data related to political events, organizations,
individual and themes on wikidata but that at the of the day you will have
to make your decision to accommodate localized idiosyncrasies in the data.

you can take look at the documentation here:

Well structured political data for the whole world - impossible Utopia or
Wikidata at its best by Lucy Chambers
https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/Documentation

and the Politics Track:

Wikidata and Open Government (Meta)Data - A missing link?
Well structured political data for the whole world: impossible utopia, or
Wikidata at its best?
Politics Meetup


Marco

On Sun, Mar 11, 2018 at 9:06 AM, Amir E. Aharoni <
amir.ahar...@mail.huji.ac.il> wrote:

> Hi,
>
> Different countries have different processes for elections: registration
> of political parties and candidates, the form of the ballot, the system for
> counting the votes and dividing the seats in the legislature, etc.
>
> I was surprised not to find almost any data about electoral history for
> Israeli political parties. I also couldn't find it for other famous
> political parties around the world, like CDU (Germany), Syriza (Greece), or
> GOP (U.S.).
>
> Representing it in Wikidata is somewhat challenging because of the
> differences between the countries, because parties, candidates lists, and
> fractions are different things, etc. But only somewhat: it definitely
> doesn't sound impossible.
>
> Is there any country whose electoral processes and history are represented
> comprehensively in Wikidata? If not, were there any attempts to even
> discuss it?
>
> Thanks!
>
> --
> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
> http://aharoni.wordpress.com
> ‪“We're living in pieces,
> I want to live in peace.” – T. Moore‬
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] Report on loading wikidata

2017-12-07 Thread Marco Neumann

did you try to point the wdqs copy to your tdb/fuseki endpoint?

On Thu, 7 Dec 2017 at 18:58, Andy Seaborne <a...@apache.org> wrote:

> Dell XPS 13 (model 9350) - the 2015 model.
> Ubuntu 17.10, not a VM.
> 1T SSD.
> 16G RAM.
> Two volumes = root and user.
> Swappiness = 10
>
> java version "1.8.0_151" (OpenJDK)
>
> Data: latest-truthy.nt.gz (version of 2017-11-24)
>
> == TDB1, tdbloader2
>8 hours // 76,164 TPS
>
> Using SORT_ARGS: --temporary-directory=/home/afs/Datasets/tmp
> to make sure the temporary files are on the large volume.
>
> The run took 28877 seconds and resulted in a 173G database.
>
> All the index files are the same size.
>
> node2id : 12G
> OSP : 53G
> SPO : 53G
> POS : 53G
>
> Algorithm:
>
> Data phase:
>
> parse file, create node table and a temporary file of all triples (3x 64
> bit numbers, written in text.
>
> Index phase:
>
> for each index, sort the temp file (using sort(1), an external sort
> utility), and make the index file by writing the sorted results, filling
> the data blocks and creating any tree blocks needed. This is a
> stream-write process - calculate the data block, write it out when full
> and never touch it again.
>
> This results in data blocks being completely full, unlike the standard
> B+Tree insertion algorithm. It is why indexes are exactly the same size.
>
> Building SPO is faster because the data is nearly sorted to start with,.
> Data often tends to arrive grouped by subject.
>
> tdbloader2 is doing stream (append) I/O on index files, not a random
> access pattern.
>
> == TDB1 tdbloader1
>29 hours 43 minutes // 20,560 TPS
>
> 106,975 seconds
> 297GDB-truthy
>
> node2id: 12G
> OSP: 97G
> SPO: 96G
> POS: 98G
>
> Same size node2id table, larger indexes.
>
> Algorithm:
>
> Data phase:
>
> parse the file and create the node table and the SPO index.
> The creation of SPO is by b+tree insert so blocks are partially full
> (average is empirically about 2/3 full). When a block fills up, it is
> split into 2.  The node table is exactly the same as tdbloader2 because
> nodes are stored in the same order.
>
> Index phase:
>
> for each index, copy SPO to the index.  This is a tree sort and the
> access pattern on blocks is fairly random which is a bad thing. Doing
> one at a time is faster than two together because more RAM in the
> OS-managed file system cache, is devoted to caching one index.  A cache
> miss is a possible write to disk, and always a read from disk, which is
> a lot of work even with an SSD.
>
> Stream reading SPO is efficient - it is not random I/O, it is stream I/O.
>
> Once the cache-efficiency of the OS disk cache drops, tdbloader slows
> down markedly.
>
> == Comparison of TDB1 loaders.
>
> Building an index is a sort because the B+Trees hold data sorted.
>
> The approach of tdbloader2 is to use an external sort algorithm (i.e.
> sort larger than RAM using temporary files) done by a highly tuned
> utility, unix sort(1).
>
> The approach of tdbloader1 is to copy into a sorted datastructure. For
> example, copying index SPO to POS, it is creating a file with keys
> sorted by P then O then S, which is not the arrival order which is
> S-sorted.  tdbloader1 maximises OS caching of memory mapped files by
> doing indexes one at a time.  Experimentation shows that doing two at
> once is slower, and doing two in parallel is no better and sometimes
> worse, than doing sequentially.
>
> == TDB2
>
> TDB2 is experimental.  The current TDB2 loader is a functional placeholder.
>
> It is writing all three indexes at the same time.  While for SPO this is
> not a bad access pattern (subjects are naturally grouped), for POS and
> OSP, the I/O is a random pattern, not a stream pattern.  There is more
> than double contention for OS disk cache, hence it is slow and gets
> slower faster.
>
> == More details.
>
> For more information, consult the Jena dev@ and user@ archives and the
> code.
>
-- 


---
Marco Neumann
KONA
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] dispute a claim on an item

2017-11-06 Thread Marco Neumann

thank you Tony, still looking forward to some kind of reconciliation
process here in the future. it would make wikidata even more appealing
as an attractive information source.

Marco


On Mon, Nov 6, 2017 at 9:28 AM, Tony Bowden <t...@mysociety.org> wrote:
> In this case it was fairly easy to find some suitable sources, so I've
> updated the claim, and added references. As the original claim was
> unsourced, I think it's fine to simply replace it, rather than marking
> it as deprecated.
>
> Tony
>
> On 5 November 2017 at 18:39, Nicolas VIGNERON
> <vigneron.nico...@gmail.com> wrote:
>> 2017-11-05 19:13 GMT+01:00 Marco Neumann <marco.neum...@gmail.com>:
>>>
>>> Andrew,
>>>
>>> what would be your first choice for conflict resolution here? write an
>>> entry into the relevant item/discuss page? or go for a Requests for
>>> comment on the Community portal? or to contact the claim author
>>> directly?
>>
>>
>> Hi,
>>
>> For resolving this specific case, I'd go on the talk page of the item
>> https://www.wikidata.org/w/index.php?title=Talk:Q16191299 and notify the
>> people concerned (in this case, the wikidatian that added the claim but,
>> ideally, also the main contributors of the Wikipedia article, « with enough
>> eyeballs »). Especially in this case where the bottom of the issue seems to
>> be the lack of sources.
>>
>> Community pages are more for broad question or if previous discussions
>> failed to come to a consensus.
>>
>> Cdlt, ~nicolas
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] dispute a claim on an item

2017-11-05 Thread Marco Neumann

GarardM,

I don't mind the wikidata mess it's part of the open data ecosystem it
tries to embrace and will actually allow the project to grow along
some interesting real world data challenges.

btw we could use the ranking feature to exclude the statement from
some of the queries. so a flag for disputed on the ranking could be a
good fit.

for the specific item I am now certain that the citizenship
association currently is incorrect or at least incomplete. this might
change again over time with future events and a possible entitlement
for dual citizenship status for this Q5.

Marco








On Sun, Nov 5, 2017 at 5:36 PM, Gerard Meijssen
<gerard.meijs...@gmail.com> wrote:
> Hoi,
> No Sjoerd, the primary post is about issues how to fix them and how to
> signal them.
>
> When it is about nationality, it is an old story and as far as I am
> concerned countries have a start date and often an end date. Prior to this
> start date the country does not exist. During the epoch of the existence of
> a country its borders change. Add to that the fact that people may have
> double nationalities or no nationality at all.. In the end it is a morass.
> Thanks,
>GerardM
>
> On 5 November 2017 at 17:27, Sjoerd de Bruin <sjoerddebr...@me.com> wrote:
>>
>> You are not answering the question (not the first time). P27 has a lot of
>> problems, due to people using it for citizenship and nationality at the same
>> time. Put this in combination with the "fantastic" Wikipedia category
>> system! AFAIK there is still some discussion going on here:
>> https://www.wikidata.org/wiki/Wikidata:Property_proposal/Nationality
>>
>> You can fix some items manually, but a good discussion and reform about
>> how we structure this kind of information is needed.
>>
>> Greetings,
>>
>> Sjoerd de Bruin
>> sjoerddebr...@me.com
>>
>> Op 5 nov. 2017, om 12:04 heeft Gerard Meijssen <gerard.meijs...@gmail.com>
>> het volgende geschreven:
>>
>> Hoi,
>> There is much more to this. When a publication has been denounced, when
>> the author is denounced for having it ghost written. When ghost written is
>> not to reflect because of the stigma involved.. We should forcefully flag
>> publications, findings and authors when there is a problem.. A query should
>> not include what they publiced what hey "found".
>>
>> At this moment Wikidata is very much a stamp collection and we should be
>> more than that.
>> Thanks,
>>  GerardM
>>
>> On 5 November 2017 at 11:40, Marco Neumann <marco.neum...@gmail.com>
>> wrote:
>>>
>>> What's the current procedure for disputing a non trivial claim on a
>>> wikidata item?
>>>
>>> I know I can just go ahead and change a claim (statement and/or its
>>> value) but the dispute itself would only be captured in the change-log
>>> of the respective wikidata instance.
>>>
>>> Would one create a discussion entry on the item page first to motivate
>>> a change on an item that's not straight forward?
>>>
>>> so for example on the item
>>>
>>> Paul Staines
>>> https://www.wikidata.org/wiki/Q16191299
>>>
>>> it states that person has
>>>
>>> :country of citizenship :United Kingdom
>>> (a claim created by  Rpfb119 on 1 April 2015‎ )
>>>
>>> but on wikipedia-en it says nationality Irish without a reference
>>> https://en.wikipedia.org/wiki/Paul_Staines
>>>
>>> is there or should there be a qualifier/reference to flag a statement
>>> to be in dispute?
>>>
>>> Also is this mailing-list the best place to discuss such (item
>>> specific) matters? Or is the Wikidata community portal with the
>>> Requests for comment service a better place?
>>>
>>>
>>> thx
>>> Marco
>>>
>>>
>>> --
>>>
>>>
>>> ---
>>> Marco Neumann
>>> KONA
>>>
>>> _______
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] dispute a claim on an item

2017-11-05 Thread Marco Neumann

What's the current procedure for disputing a non trivial claim on a
wikidata item?

I know I can just go ahead and change a claim (statement and/or its
value) but the dispute itself would only be captured in the change-log
of the respective wikidata instance.

Would one create a discussion entry on the item page first to motivate
a change on an item that's not straight forward?

so for example on the item

Paul Staines
https://www.wikidata.org/wiki/Q16191299

it states that person has

:country of citizenship :United Kingdom
(a claim created by  Rpfb119 on 1 April 2015‎ )

but on wikipedia-en it says nationality Irish without a reference
https://en.wikipedia.org/wiki/Paul_Staines

is there or should there be a qualifier/reference to flag a statement
to be in dispute?

Also is this mailing-list the best place to discuss such (item
specific) matters? Or is the Wikidata community portal with the
Requests for comment service a better place?


thx
Marco


-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] claim change ignored

2017-10-22 Thread Marco Neumann

thanks Maarten, there still seems to be a mix up with the data though, e.g.
image information

On Sun, 22 Oct 2017 at 18:49, Maarten Dammers <maar...@edamers.nl> wrote:

> Hi Marco,
>
> Op 21-10-2017 om 14:48 schreef Marco Neumann:
> > in any event it's a false claim in this example and I will remove the
> > claim now. 2-2=0 ;)
> I undid your edit. You seem to be mixing up father (
> https://www.wikidata.org/wiki/Q2650401 ) and child (
> https://www.wikidata.org/wiki/Q15434505). Description also updated to
> make the difference clearer.
>
> Maarten
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] claim change ignored

2017-10-21 Thread Marco Neumann

I see thanks Nicolas, would be nice to have a constraint here to allow
for only one claim if the value is exactly the same.

in any event it's a false claim in this example and I will remove the
claim now. 2-2=0 ;)



On Sat, Oct 21, 2017 at 2:10 PM, Nicolas VIGNERON
<vigneron.nico...@gmail.com> wrote:
> Hi,
>
> The bot removed one claim, but there was two identical claims (see the item
> before the removal :
> https://www.wikidata.org/w/index.php?title=Q2650401=145856801 ).
>
> 2-1=1, this is logical.
>
> Cdlt, ~nicolas
>
> 2017-10-21 13:59 GMT+02:00 Marco Neumann <marco.neum...@gmail.com>:
>>
>> according to the Revision history of "Alois Rainer" (Q2650401) the
>> item should not hold claim P570 (date of death)
>>
>> https://www.wikidata.org/wiki/Q2650401
>>
>> the claim was removed in 2014
>>
>>  09:40, 20 July 2014‎ KrBot (talk | contribs)‎ . . (3,531 bytes)
>> (-371)‎ . . (‎Removed claim: date of death (P570): 14 May 2002)
>>
>> is there an explanation why the claim is still attached to the wikidata
>> item?
>>
>> Marco
>>
>> --
>>
>>
>> ---
>> Marco Neumann
>> KONA
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] claim change ignored

2017-10-21 Thread Marco Neumann

according to the Revision history of "Alois Rainer" (Q2650401) the
item should not hold claim P570 (date of death)

https://www.wikidata.org/wiki/Q2650401

the claim was removed in 2014

 09:40, 20 July 2014‎ KrBot (talk | contribs)‎ . . (3,531 bytes)
(-371)‎ . . (‎Removed claim: date of death (P570): 14 May 2002)

is there an explanation why the claim is still attached to the wikidata item?

Marco

-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Turning Lists to Wikidata

2017-10-18 Thread Marco Neumann

On Wed, Oct 18, 2017 at 12:50 PM, Yuri Astrakhan
<yuriastrak...@gmail.com> wrote:
>
>> when you say "wikidata is not well  suited for lists data", you refer
>> to wikibase or WDQS here?
>
>
> Wikibase, per Daniel K.
>>
>>
>> the  data:Bea.gov/GDP by state.tab above is certainly a good
>> representation for efficient delivery (via json) and display of data.
>> but inefficient for further data sharing without URIs.
>
>
> What do you mean by further sharing without uri?

if the cell is identified by a string rather than a URI (as it is the
case in the example above) disambiguation is necessary and error
prone.

e.g. Berlin

wd:Q64
wd:Q4579913
wd:Q5932836

> Btw, there is a wikidata property to link to the .map data pages. Not sure
> about the .tab, but might also work
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Turning Lists to Wikidata

2017-10-18 Thread Marco Neumann

when you say "wikidata is not well  suited for lists data", you refer
to wikibase or WDQS here?

the  data:Bea.gov/GDP by state.tab above is certainly a good
representation for efficient delivery (via json) and display of data.
but inefficient for further data sharing without URIs.

On Tue, Oct 17, 2017 at 9:08 PM, Yuri Astrakhan <yuriastrak...@gmail.com> wrote:
> There is a better alternative to storing lists -
> https://www.mediawiki.org/wiki/Help:Tabular_Data -- it allows you to store a
> CSV-like table of data on Commons, with localized columns, and access it
> from all other wikis from the  and Lua scripts.
>
> A good example of it -- "per state GDP" page  -- see graph in the upper
> right corner.
>
> Page: List_of_U.S._states_by_GDP
> Data: GDP_by_state.tab
>
> Wikidata is not very well  suited for lists data, but tabular data was
> designed for a relatively large (up to 2mb) lists.
>
> On Tue, Oct 17, 2017 at 12:46 PM, Marco Neumann <marco.neum...@gmail.com>
> wrote:
>>
>> great I will be on site as well, so we will have a little more time to
>> discuss this in detail.
>>
>> On Tue, Oct 17, 2017 at 4:30 PM, Antonin Delpeuch (lists)
>> <li...@antonin.delpeuch.eu> wrote:
>> > Hi Marco,
>> >
>> > I agree that many of these lists and tables could be harvested (with
>> > some care, of course).
>> >
>> > However, I don't think that the information they contain should go to
>> > the Wikidata item they are associated with. This Wikidata item mostly
>> > exists to store inter-language links, but is poorly connected to the
>> > rest of the knowledge graph. This tax revenue and GDP information should
>> > go to the country items themselves.
>> >
>> > I am working on the problem of extraction of statements from lists and
>> > tables and will write a tutorial when the tools are ready. WikidataCon
>> > attendees might have a glimpse of that in this session:
>> >
>> > https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/Submissions/OpenRefine_demo
>> >
>> > Cheers,
>> > Antonin
>> >
>> > On 17/10/2017 12:07, Marco Neumann wrote:
>> >> I have noticed a lack of actual data in wikidata representations of
>> >> wikipedia list.
>> >>
>> >> for example
>> >>
>> >> List of countries by tax revenue to GDP ratio
>> >>
>> >> https://en.wikipedia.org/wiki/List_of_countries_by_tax_revenue_to_GDP_ratio
>> >>
>> >> to
>> >>
>> >> List of countries by tax revenue as percentage of GDP (Q2529105)
>> >> https://www.wikidata.org/wiki/Q2529105
>> >>
>> >>
>> >> is there currently a development in the wikidata community to
>> >> transform these lists into wikibase items and last but not least
>> >> produce RDF respresentations for WDQS?
>> >>
>> >> best,
>> >> Marco
>> >>
>> >
>> >
>> > ___
>> > Wikidata mailing list
>> > Wikidata@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>> --
>>
>>
>> ---
>> Marco Neumann
>> KONA
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Turning Lists to Wikidata

2017-10-17 Thread Marco Neumann

great I will be on site as well, so we will have a little more time to
discuss this in detail.

On Tue, Oct 17, 2017 at 4:30 PM, Antonin Delpeuch (lists)
<li...@antonin.delpeuch.eu> wrote:
> Hi Marco,
>
> I agree that many of these lists and tables could be harvested (with
> some care, of course).
>
> However, I don't think that the information they contain should go to
> the Wikidata item they are associated with. This Wikidata item mostly
> exists to store inter-language links, but is poorly connected to the
> rest of the knowledge graph. This tax revenue and GDP information should
> go to the country items themselves.
>
> I am working on the problem of extraction of statements from lists and
> tables and will write a tutorial when the tools are ready. WikidataCon
> attendees might have a glimpse of that in this session:
> https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017/Submissions/OpenRefine_demo
>
> Cheers,
> Antonin
>
> On 17/10/2017 12:07, Marco Neumann wrote:
>> I have noticed a lack of actual data in wikidata representations of
>> wikipedia list.
>>
>> for example
>>
>> List of countries by tax revenue to GDP ratio
>> https://en.wikipedia.org/wiki/List_of_countries_by_tax_revenue_to_GDP_ratio
>>
>> to
>>
>> List of countries by tax revenue as percentage of GDP (Q2529105)
>> https://www.wikidata.org/wiki/Q2529105
>>
>>
>> is there currently a development in the wikidata community to
>> transform these lists into wikibase items and last but not least
>> produce RDF respresentations for WDQS?
>>
>> best,
>> Marco
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Turning Lists to Wikidata

2017-10-17 Thread Marco Neumann

I have noticed a lack of actual data in wikidata representations of
wikipedia list.

for example

List of countries by tax revenue to GDP ratio
https://en.wikipedia.org/wiki/List_of_countries_by_tax_revenue_to_GDP_ratio

to

List of countries by tax revenue as percentage of GDP (Q2529105)
https://www.wikidata.org/wiki/Q2529105


is there currently a development in the wikidata community to
transform these lists into wikibase items and last but not least
produce RDF respresentations for WDQS?

best,
Marco

-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata-tech] query with incomplete result set

2017-10-03 Thread Marco Neumann

thank you Lucas and Stas, this works for me.

so it would be fair to say that p:P39 by-passes the semantics of
wdt:P39 with ranking*. for my own understanding why is a wdt property
called a direct property**?

* https://www.wikidata.org/wiki/Help:Ranking
**https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Basics_-_Understanding_Prefixes

On Tue, Oct 3, 2017 at 7:24 PM, Stas Malyshev <smalys...@wikimedia.org> wrote:
> Hi!
>
> On 10/3/17 4:02 PM, Marco Neumann wrote:
>> why doesn't the following query produce
>> http://www.wikidata.org/entity/Q17905 in the result set?
>
> The query asks for wdt:P39 wd:Q1939555, however current preferred value
> for P39 there is Q29576752. When the item has preferred value, only this
> value shows up in wdt. If you want all values, use something like:
>
> https://query.wikidata.org/#SELECT%20%3FMdB%20%3FMdBLabel%20WHERE%20%7B%0ASERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%3FMdB%20p%3AP39%20%3FxRefNode.%20%0A%3FxRefNode%20pq%3AP2937%20wd%3AQ30579723%3B%0A%20%20%20ps%3AP39%20wd%3AQ1939555.%0A%7D
>
> Or change "preferred" status on Q17905:P39.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org



-- 


---
Marco Neumann
KONA

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

[Wikidata-tech] query with incomplete result set

2017-10-03 Thread Marco Neumann

why doesn't the following query produce
http://www.wikidata.org/entity/Q17905 in the result set?

https://query.wikidata.org/#SELECT%20%3FMdB%20%3FMdBLabel%20WHERE%20%7B%0ASERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%3FMdB%20wdt%3AP39%20wd%3AQ1939555%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20p%3AP39%20%3FxRefNode.%20%0A%3FxRefNode%20pq%3AP2937%20wd%3AQ30579723.%0A%7D

Thx

-- 


---
Marco Neumann
KONA

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

Re: [Wikidata] How to split a label by whitespace in WDQS ?

2017-09-19 Thread Marco Neumann

this is something you might want to bring up with the Blazegraph team.
Jena for example provides the apf:strSplit SPARQL function in ARQ.

On Tue, Sep 19, 2017 at 2:39 PM, Thad Guidry <thadgui...@gmail.com> wrote:
> Thanks Christopher,
>
> But I really am looking to split by whitespace, with an unknown of how many
> tokens in a label.  My example of human names was just to simplify, but
> could be anything... not just human names.  Any Wikidata QID.
> Like "Castle of Saint Pée sur Nivelle"
> I would want 6 columns automatically created for that. Or in JSON terms.. An
> array of string objects.
> {
> "Castle",
> "of",
> "Saint",
> "Pée",
> "sur",
> "Nivelle",
> }
>
> This has to do with a use case of pre-processing the label names for data
> ingestion into further analysis workflows.
> I was hoping that I could easily leverage a bit of horsepower for free from
> the WDQS for this (splitting label names)...perhaps even using the Label
> service itself to do the splitting.
>
> The indexing service behind the scenes already stores much of this, and
> stores those tokens for each label.
> The problem is that we don't currently have a way to get the tokens of a
> label for any particular QID and its labels in various languages.
> And that's what I want to solve, either through SPARQL or an enhancement to
> the Label service or something else.
> If the answer is that I will have to resort to my own programmatic methods
> via the dump files then so be it, I guess, but I'd rather not have to put in
> the work for something that is done already behind the scenes.
>
> -Thad
> +ThadGuidry
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] How to split a label by whitespace in WDQS ?

2017-09-18 Thread Marco Neumann

this covers some of your query

SELECT ?human ((substr(?label,"0","4")) AS ?title) (substr(?label,"4")
AS ?name) (strEnds( (substr(?name,"4")),"y") AS ?nameEndsWithY )
WHERE
{
  ?human wdt:P31 wd:Q15632617; rdfs:label ?label.
  FILTER(LANG(?label) = "en").
  FILTER(STRSTARTS(?label, "Mr. ")).
}

On Mon, Sep 18, 2017 at 1:07 PM, Thad Guidry <thadgui...@gmail.com> wrote:
>
> Say I have this query...
>
> SELECT ?human ?label
> WHERE
> {
>   ?human wdt:P31 wd:Q15632617; rdfs:label ?label.
>   FILTER(LANG(?label) = "en").
>   FILTER(STRSTARTS(?label, "Mr. ")).
> }
>
> What if I wanted to see if any one of a humans name ends with "y" such as my
> last name does , their first, last, doesn't matter.  I have a "d" and a "y"
> on the array returned from my name (if it were split by whitespace)
>
> I did not see any special syntax or FILTER or Label service commands to help
> with splitting apart a Label by whitespace and then applying a filter on
> each string.
>
> How would I accomplish this ?
>
> Thad
> +ThadGuidry
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 


---
Marco Neumann
KONA

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

[Wikidata] Re: units

[Wikidata] Re: units

[Wikidata-tech] Re: Fehlerhafte Anzeige

[Wikidata-tech] Re: Fehlerhafte Anzeige

[Wikidata] Re: WDQS State of the Union, Dec 2021

[Wikidata] Re: Blazegraph failure playbook

[Wikidata] Re: Blazegraph failure playbook

Re: [Wikidata] Status of Wikidata Query Service

Re: [Wikidata] Status of Wikidata Query Service

Re: [Wikidata] Status of Wikidata Query Service

Re: [Wikidata] Scaling Wikidata Query Service

Re: [Wikidata] Scaling Wikidata Query Service

Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

Re: [Wikidata] Are we ready for our future

Re: [Wikidata] Are we ready for our future

Re: [Wikidata] Are we ready for our future

Re: [Wikidata] Are we ready for our future

Re: [Wikidata] Election data

Re: [Wikidata-tech] Report on loading wikidata

Re: [Wikidata] dispute a claim on an item

Re: [Wikidata] dispute a claim on an item

[Wikidata] dispute a claim on an item

Re: [Wikidata] claim change ignored

Re: [Wikidata] claim change ignored

[Wikidata] claim change ignored

Re: [Wikidata] Turning Lists to Wikidata

Re: [Wikidata] Turning Lists to Wikidata

Re: [Wikidata] Turning Lists to Wikidata

[Wikidata] Turning Lists to Wikidata

Re: [Wikidata-tech] query with incomplete result set

[Wikidata-tech] query with incomplete result set

Re: [Wikidata] How to split a label by whitespace in WDQS ?

Re: [Wikidata] How to split a label by whitespace in WDQS ?

33 matches

Site Navigation

Mail list logo

Footer information