[Wikidata] Re: Easiest way to get all sitelists counts > 0?

2022-03-23 Thread Imre Samu
> In the queryable Wikidata model, there is a property wikibase:sitelinks
whose value is an integer that is the number of Wikipedia sites that the
item appears on if it is on at least one site.
> This is what I'm after.  I'm not sure that this value is in the RDF dumps
and in the smaller truthy dumps, in particular.

As I see the "latest-all.nt.bz2"  contains the "sitelink" info (
downloaded from here https://dumps.wikimedia.org/wikidatawiki/entities/  )

$ bzcat latest-all.nt.bz2 | grep sitelink | head
 <
http://wikiba.se/ontology#sitelinks> "345"^^<
http://www.w3.org/2001/XMLSchema#integer> .
 <
http://wikiba.se/ontology#sitelinks> "149"^^<
http://www.w3.org/2001/XMLSchema#integer> .
 <
http://wikiba.se/ontology#sitelinks> "235"^^<
http://www.w3.org/2001/XMLSchema#integer> .
 <
http://wikiba.se/ontology#sitelinks> "26"^^<
http://www.w3.org/2001/XMLSchema#integer> .
 <
http://wikiba.se/ontology#sitelinks> "116"^^<
http://www.w3.org/2001/XMLSchema#integer> .
 <
http://wikiba.se/ontology#sitelinks> "29"^^<
http://www.w3.org/2001/XMLSchema#integer> .
 <
http://wikiba.se/ontology#sitelinks> "119"^^<
http://www.w3.org/2001/XMLSchema#integer> .
 <
http://wikiba.se/ontology#sitelinks> "338"^^<
http://www.w3.org/2001/XMLSchema#integer> .
 <
http://wikiba.se/ontology#sitelinks> "292"^^<
http://www.w3.org/2001/XMLSchema#integer> .
 <
http://wikiba.se/ontology#sitelinks> "138"^^<
http://www.w3.org/2001/XMLSchema#integer> .

>  the number of Wikipedia sites

For example the first line in my example:  Q31 = Belgium ( country in
western Europe )   https://www.wikidata.org/wiki/Q31
 <
http://wikiba.se/ontology#sitelinks> "345"^^<
http://www.w3.org/2001/XMLSchema#integer> .

*Q31.Sitelinks= 345 *
*  ==  [  Wikipedia(278 entries)*  + Wikibooks(3 entries) + Wikinews(30
entries)   + Wikiquote(12 entries) + Wikivoyage(21 entries) +  Multilingual
sites(1 entry) ]

It is not entirely clear to me that you need the "278" or the "345" as a
result.


Kind regards,
 Imre
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: Easiest way to get all sitelists counts > 0?

2022-03-22 Thread Imre Samu
>  sitelinks  /  I want to use the data to help rank possible text entity
links to Wikidata items

side note:
I am helping the https://www.naturalearthdata.com/ project by adding
wikidata concordances.
it is a public domain geo-database ... with [ mountains, rivers, populated
places, .. ]
I am using wikidata json dumps - and I am importing to PostGIS database.
And I am ranking the matches with
- distance,   ( lower is better )
- text similarity ( I am checking the "labels" and the "aliases"  )
- and sitelinks!

And I am lowering the "mostly imported sitelinks" ranks ("cebwiki" , ... )
why? :
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2017/08#Nonsense_imported_from_Geonames

Because a lot of geodata re-imported.   And the "distance" and
"text/labels" are the same.
So be careful with the imported Wikipedia pages! ( sitelinks )
Now: As I see the geodata quality is so much better -  mostly: where the
active wikidata community is cleaning ..

it is just an example of why the simple "sitelinks" number is not enough
:-)

on the other hand:probably the P625 coordinate location is also
important.   https://www.wikidata.org/wiki/Property:P625
In Germany - the "dewiki" is higher ranks.
in Hungary  - the "huwiki" is prefered.

Kind Regards,
 Imre





 ezt írta (időpont: 2022. márc. 22., K, 22:25):

> Is there a simple way to get the sitelinks count data for all Wikidata
> items?  I want to use the data to help rank possible text entity links to
> Wikidata items
>
> I'm really only interested in counts for items that have at least one
> (e.g., wikibase:sitelinks value that's >0).  According to statistics I've
> seen, only about 1/3 of Wikidata items have at least one sitelink.
>
> I'm not sure if wikibase:sitelinks is included in the standard WIkidata
> dump.  I could try a SPARQL query with an OFFSET and LIMIT, but I doubt
> that the approach would work to completion.
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: Easiest way to get all sitelists counts > 0?

2022-03-22 Thread Imre Samu
> I'm not sure if wikibase:sitelinks is included in the standard WIkidata
dump.

As I see - it is in the JSON dump.
https://www.wikidata.org/wiki/Wikidata:Database_download#JSON_dumps_(recommended)

https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_json.html#json_sitelinks

example:

{
"sitelinks": {
"afwiki": {
"site": "afwiki",
"title": "New York Stad",
"badges": []
},
"frwiki": {
"site": "frwiki",
"title": "New York City",
"badges": []
},
"nlwiki": {
"site": "nlwiki",
"title": "New York City",
"badges": [
"Q17437796"
]
},
"enwiki": {
"site": "enwiki",
"title": "New York City",
"badges": []
},
"dewiki": {
"site": "dewiki",
"title": "New York City",
"badges": [
"Q17437798"
]
}
}
}



Kind Regards,
Imre


 ezt írta (időpont: 2022. márc. 22., K, 22:25):

> Is there a simple way to get the sitelinks count data for all Wikidata
> items?  I want to use the data to help rank possible text entity links to
> Wikidata items
>
> I'm really only interested in counts for items that have at least one
> (e.g., wikibase:sitelinks value that's >0).  According to statistics I've
> seen, only about 1/3 of Wikidata items have at least one sitelink.
>
> I'm not sure if wikibase:sitelinks is included in the standard WIkidata
> dump.  I could try a SPARQL query with an OFFSET and LIMIT, but I doubt
> that the approach would work to completion.
> ___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: Private Information Retrieval

2022-01-29 Thread Imre Samu
> I started using WikiData for Private Information Retrieval.

What is the "root cause" of your problems?   (
https://en.wikipedia.org/wiki/Root_cause_analysis )

Now the "privacy policy" is based on U.S. law (
https://foundation.wikimedia.org/wiki/Privacy_policy ) and as I know this
is not a GDPR friendly.
*"The Wikimedia Foundation is a non-profit organization based in San
Francisco, California, with servers and data centers located in the U.S. If
you decide to use Wikimedia Sites, whether from inside or outside of the
U.S., you understand that your Personal Information will be collected,
transferred, stored, processed, disclosed and otherwise used in the U.S. as
described in this Privacy Policy."*

IMHO:
- for most EU people will be enough a GDPR friendly  ".eu" wikidata query
service mirror
- adding extra noise for the wikidata query is not an environmentally
friendly solution.

best,
  Imre

Darius Runge  ezt írta (időpont: 2022. jan. 29.,
Szo, 13:44):

> Dear all,
>
> I started using WikiData for Private Information Retrieval. This allows
> answering certain questions while maintaining a high degree of secrecy.
> Suppose you wanted to know when Einstein was born, but for some reason you
> must keep the fact that you want to know this a secret. In this case, we
> assume a threat model with perfect knowledge about the computer in use, not
> just that someone managed to log the Wikidata requests.
>
> One solution would be to request a table of every human who ever won the
> Nobel Prize (this requires the common knowledge of Einstein being a winner
> of said) with the kind of Nobel Prize, date awarded, date of birth, date of
> death unless living, nationality etc. If we let this table scroll across
> the screen and read the required entry, there would be - as far as I can
> tell - no way to learn which entry (and how many of them) is of our
> interest.
>
> I have written a simple PHP script that allows one to enter a SPARQL
> request and have it displayed as a scrolling table. Please be advised that
> this is in a very informal alpha state, and I am no professional Web
> Developer. It's a mere proof-of-concept, but feel free to try it out if the
> API quota lets you.
>
> https://darius-runge.eu/otp/request.php
>
> My question is, whether anyone of you might be interested in working with
> me on discussing practical implications of this method (how should requests
> be written to allow for the desired privacy?) or even making a better
> implementation of a tool that allows viewing the scrolling table of
> printing it out.
>
> Feel free to reply to this mailing list entry or contact me privately with
> the postal or telecommunication data provided in the footer in case you
> don't want to discuss it in public.
>
> Best,
> Darius
>
> 
> Darius Runge
> Postfach 3
> 72669 Unterensingen
> Germany
>
> Tel +49 7022 5064970
> Fax +49 7022 5064971
> Vox +49 7022 5064998 (2 min)
>
> All up-to-date contact data:
> https://darius-runge.eu___
> Wikidata mailing list -- wikidata@lists.wikimedia.org
> To unsubscribe send an email to wikidata-le...@lists.wikimedia.org
>
___
Wikidata mailing list -- wikidata@lists.wikimedia.org
To unsubscribe send an email to wikidata-le...@lists.wikimedia.org


[Wikidata] Re: Wikidata Query Service scaling update Aug 2021

2021-08-18 Thread Imre Samu
>  (i) identify and delete lower priority data (e.g. labels, descriptions,
aliases, non-normalized values, etc);

Ouch.
For me
- as a native Hungarian: the labels, descriptions, aliases - is extremely
important
- as a data user: I am using "labels","aliases" in my concordances tools (
mapping wikidata-ids with external ids )

So  Please clarify the practical meaning of the *"delete"*

Thanks in advance,
  Imre



Mike Pham  ezt írta (időpont: 2021. aug. 18., Sze,
23:08):

> Wikidata community members,
>
> Thank you for all of your work helping Wikidata grow and improve over the
> years. In the spirit of better communication, we would like to take this
> opportunity to share some of the current challenges Wikidata Query Service
> (WDQS) is facing, and some strategies we have for dealing with them.
>
> WDQS currently risks failing to provide acceptable service quality due to
> the following reasons:
>
>
>1.
>
>Blazegraph scaling
>1.
>
>   Graph size. WDQS uses Blazegraph as our graph backend. While
>   Blazegraph can theoretically support 50 billion edges
>   , in reality Wikidata is the largest graph
>   we know of running on Blazegraph (~13 billion triples
>   
> ),
>   and there is a risk that we will reach a size
>   
> limit
>   of what it can realistically support
>   . Once Blazegraph is
>   maxed out, WDQS can no longer be updated. This will also break Wikidata
>   tools that rely on WDQS.
>   2.
>
>   Software support. Blazegraph is end of life software, which is no
>   longer actively maintained, making it an unsustainable backend to 
> continue
>   moving forward with long term.
>
>
> Blazegraph maxing out in size poses the greatest risk for catastrophic
> failure, as it would effectively prevent WDQS from being updated further,
> and inevitably fall out of date. Our long term strategy to address this is
> to move to a new graph backend that best meets our WDQS needs and is
> actively maintained, and begin the migration off of Blazegraph as soon as a
> viable alternative is identified
> .
>
> In the interim period, we are exploring disaster mitigation options for
> reducing Wikidata’s graph size in the case that we hit this upper graph
> size limit: (i) identify and delete lower priority data (e.g. labels,
> descriptions, aliases, non-normalized values, etc); (ii) separate out
> certain subgraphs (such as Lexemes and/or scholarly articles). This would
> be a last resort scenario to keep Wikidata and WDQS running with reduced
> functionality while we are able to deploy a more long-term solution.
>
>
>
>1.
>
>Update and access scaling
>1.
>
>   Throughput. WDQS is currently trying to provide fast updates, and
>   fast unlimited queries for all users. As the number of SPARQL
>   queries grows over time
>   
> alongside
>   graph updates, WDQS is struggling to sufficiently keep up
>   
> 
>   in each dimension of service quality without compromising anywhere.  For
>   users, this often leads to timed out queries.
>   2.
>
>   Equitable service. We are currently unable to adjust system
>   behavior per user/agent. As such, it is not possible to provide 
> equitable
>   service to users: for example, a heavy user could swamp WDQS enough to
>   hinder usability by community users.
>
>
> In addition to being a querying service for Wikidata, WDQS is also part of
> the edit pipeline of Wikidata (every edit on Wikidata is pushed to WDQS to
> update the data there). While deploying the new Flink-based Streaming
> Updater  will help with
> increasing throughput of Wikidata updates, there is a substantial risk that
> WDQS will be unable to keep up with the combination of increased querying
> and updating, resulting in more tradeoffs between update lag and querying
> latency/timeouts.
>
> In the near term, we would like to work more closely with you to determine
> what acceptable trade-offs would be for preserving WDQS functionality while
> we scale up Wikidata querying. In the long term, we will be conducting more
> user research to better understand your needs so we can (i) optimize
> querying via SPARQL and/or other methods, (ii) explore better user
> management that will allow us to prevent heavy use of WDQS that does not
> align with the goals of our movement and projects, and (iii) make it easier
> for users to set up and run their own query services.
>
> Though this information

Re: [Wikidata] 2 million queries against a Wikidata instance

2020-07-13 Thread Imre Samu
> How can I speed up the queries processing even more?

imho: drop the unwanted data as early as you can ...  (~
aggressive prefiltering ;  ~ not import  )

> Any suggestion will be appreciated.

in your case ..
- I will check the RDF dumps ..
https://www.wikidata.org/wiki/Wikidata:Database_download#RDF_dumps
- I will try to write a custom filter for pre-filter for 2 million
parameters  ... ( simple text parsing ..  in GoLang; using multiple cores
... or with other fast code  )
- and just  load the results to PostgreSQL ..

I have a good experience - parsing the and filtering the wikidata json dump
(gzipped) .. and loading the result to PostgreSQL database ..
I can run the full code on my laptop  and the result in my case ~
12 GB in the PostgreSQL ...

the biggest problem .. the memory requirements of  "2 million parameters"
 .. but you can choose some fast key-value storage .. like RocksDB ...
but there are other low tech parsing solutions ...

Regards,
 Imre



Best,
 Imre



Adam Sanchez  ezt írta (időpont: 2020. júl. 13., H,
19:42):

> Hi,
>
> I have to launch 2 million queries against a Wikidata instance.
> I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with
> RAID 0).
> The queries are simple, just 2 types.
>
> select ?s ?p ?o {
> ?s ?p ?o.
> filter (?s = ?param)
> }
>
> select ?s ?p ?o {
> ?s ?p ?o.
> filter (?o = ?param)
> }
>
> If I use a Java ThreadPoolExecutor takes 6 hours.
> How can I speed up the queries processing even more?
>
> I was thinking :
>
> a) to implement a Virtuoso cluster to distribute the queries or
> b) to load Wikidata in a Spark dataframe (since Sansa framework is
> very slow, I would use my own implementation) or
> c) to load Wikidata in a Postgresql table and use Presto to distribute
> the queries or
> d) to load Wikidata in a PG-Strom table to use GPU parallelism.
>
> What do you think? I am looking for ideas.
> Any suggestion will be appreciated.
>
> Best,
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Timeout on example queries

2019-10-12 Thread Imre Samu
> The "wikibase:label" service is know to consume and it timeouts often

imho: We can use subquery for optimization so the label service doesn’t
have to do as much work

https://w.wiki/9zD ( LIMIT 1000 + Labels + Descriptions )

LIMIT 1   -  6924 results in  59861 ms  (  birthday = 13 October 2014 )
LIMIT 1000   -1000 results in  12050 ms

the ideal case will be - if the sparql engine can detect and automatically
optimize this type of queries.

Imre






Nicolas VIGNERON  ezt írta (időpont: 2019. okt.
12., Szo, 19:01):

> Hi,
>
> The "wikibase:label" service is know to consume and it timeouts often
> (nothing new it happens since the beginning ; but as Wikidata grows, it's
> more or more visible).
>
> This simplified version of the query works (without intermediate variables
> and without the service): https://w.wiki/9yR
>
> Cheers,
> ~nicolas
>
> Le sam. 12 oct. 2019 à 18:41, Fabrizio Carrai 
> a écrit :
>
>> I noticed few timeouts on example queries, for example
>>
>> Whose birthday is today ?
>> 
>>
>> Is it correct to have this condition ?
>> --
>> *Fabrizio*
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Cannot add identifiers

2019-09-15 Thread Imre Samu
> but I only found "add statement" at the end of the page

yes,   just an "add statement" and start type of the keyword ( twitter,
spotify, .. )
sorry,  I have added your spotify iD (
https://open.spotify.com/artist/7vRPVTBSYfRbS88Jx9duh5/about )



Raffaele Costanzo  ezt írta (időpont: 2019. szept.
15., V, 11:17):

> Good afternoon,
>
> I'm Raffaele, AKA Japponcino.
> Because of I'm an artist, I created with the help of a friend my Wikidata
> page.
> My Wikidata page is this: https://www.wikidata.org/wiki/Q64154408
> My friend suggested me to add identifiers like socials, so I went to the
> page I listed above and I tried to add identifiers, but I only found "add
> statement" at the end of the page, while my friend has identifiers and can
> add them. His page is this: https://www.wikidata.org/wiki/Q61758300
>
> How can be this issue solved?
>
> Best regards,
> Raffaele
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata

2019-08-29 Thread Imre Samu
Hi Sebastian,

>Is there a list of geodata issues, somewhere? Can you give some example?

My main "pain" points:

- the cebuano geo duplicates:
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2017/10#Cebuano
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2018/06#A_proposed_course_of_action_for_dealing_with_cebwiki/svwiki_geographic_duplicates

- detecting "anonym" editings  of the wikidata labels from wikidata JSON
dumps.  As I know - Now it is impossible, - no similar  information in the
JSON dump, so I cant' create a score.
  This is similar problem like the original posts ; ( ~ quality score )
 but I would like to use the original editing history and
implementing/tuning my scoring algorithm.

  When somebody renaming some city names (trolls) , then my matching
algorithm not find them,
  and in this cases I can use the previous "better" state of the wikidata.
  It is also important for merging openstreetmap place-names with wikidata
labels for end users.



> Do you have a reference dataset as well, or would that be NaturalEarth
itself?

Sorry, I don't have a reference datasets.  and NaturalEarth is only a
subset of the "reality" . not contains all cities, rivers, ...
But maybe you can use OpenStreetMap as a best resource.
Sometimes I matching add adding wikidata concordances to
https://www.whosonfirst.org/ (WOF)  gazetteer; but this data originated
mostly from  similar sources ( geonames,..)  so can't use a quality
indicator.

If you need some easy example - probably the "airports" is a good start for
checking wikidata completeness.
(p238_iata_airport_code ; p239_icao_airport_code ; p240_faa_airport_code
; p931_place_served ;  p131_located_in )

> What would help you to measure completeness for adding concordances to
NaturalEarth.

I have created my own tools/scripts  ;  because waiting for the community
for fixing cebwiki data problems is lot of times.

I am importing wikidata JSON dumps to PostGIS ( the SparQL is not so
flexible/scalable  for geo matchings , )
- adding some scoring based on cebwiki /srwiki ...
- creating some sheets for manual checking.
but this process is like a  ~ "fuzzy left join" ...  with lot of hacky
codes and manual tunings.

If I don't find some NaturalEarth/WOF  object in the wikidata, then I have
to manually debug.
The most problems is
- different transliterations / spellings / english vs. local names ...
- some trolling by  anonymous users ( mostly from mobile phone ).
- problems with  GPS coordinates.
- changes in the real data ( cities joining / splitting ) so need lot of
background research.

best,
Imre











Sebastian Hellmann  ezt írta (időpont:
2019. aug. 28., Sze, 11:11):

> Hi Imre,
>
> we can encode these rules using the JSON MongoDB database we created in
> GlobalFactSync project (
> https://meta.wikimedia.org/wiki/Grants:Project/DBpedia/GlobalFactSyncRE).
> As  basis for the GFS Data Browser. The database has open read access.
>
> Is there a list of geodata issues, somewhere? Can you give some example?
> GFS focuses on both: overall quality measures and very domain specific
> adaptations. We will also try to flag these issues for Wikipedians.
>
> So I see that there is some notion of what is good and what not by source.
> Do you have a reference dataset as well, or would that be NaturalEarth
> itself? What would help you to measure completeness for adding concordances
> to NaturalEarth.
>
> -- Sebastian
> On 24.08.19 21:26, Imre Samu wrote:
>
> For geodata ( human settlements/rivers/mountains/... )  ( with GPS
> coordinates ) my simple rules:
> - if it has a  "local wikipedia pages" or  any big
> lang["EN/FR/PT/ES/RU/.."]  wikipedia page ..  than it is OK.
> - if it is only in "cebuano" AND outside of "cebuano BBOX" ->  then 
> this is lower quality
> - only:{shwiki+srwiki} AND outside of "sh"&"sr" BBOX ->  this is lower
> quality
> - only {huwiki} AND outside of CentralEuropeBBOX -> this is lower quality
> - geodata without GPS coordinate ->  ...
> - 
> so my rules based on wikipedia pages and languages areas ...  and I prefer
> wikidata - with local wikipedia pages.
>
> This is based on my experience - adding Wikidata ID concordances to
> NaturalEarth ( https://www.naturalearthdata.com/blog/ )
>
> --
> All the best,
> Sebastian Hellmann
>
> Director of Knowledge Integration and Linked Data Technologies (KILT)
> Competence Center
> at the Institute for Applied Informatics (InfAI) at Leipzig University
> Executive Director of the DBpedia Association
> Projects: http://dbpedia.org, http://nlp2rdf.org,
> http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
> <http://www.w3.org/community/ld4lt>
> Homepage: http://aksw.org/SebastianHellmann
> Research Group: http://aksw.org
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata

2019-08-24 Thread Imre Samu
TLDR:  it would be useful ; but extreme hard to create rules for every
domains.

>4. How to calculate and represent them?

imho:  it is deepends of the data domain.

For geodata ( human settlements/rivers/mountains/... )  ( with GPS
coordinates ) my simple rules:
- if it has a  "local wikipedia pages" or  any big
lang["EN/FR/PT/ES/RU/.."]  wikipedia page ..  than it is OK.
- if it is only in "cebuano" AND outside of "cebuano BBOX" ->  then 
this is lower quality
- only:{shwiki+srwiki} AND outside of "sh"&"sr" BBOX ->  this is lower
quality
- only {huwiki} AND outside of CentralEuropeBBOX -> this is lower quality
- geodata without GPS coordinate ->  ...
- 
so my rules based on wikipedia pages and languages areas ...  and I prefer
wikidata - with local wikipedia pages.

This is based on my experience - adding Wikidata ID concordances to
NaturalEarth ( https://www.naturalearthdata.com/blog/ )


>5. Which is the most suitable way to further discuss and implement this
idea?

imho:  Loading the wikidata dump to the local database;
and creating
- some "proof of concept" quality data indicators.
- some "meta" rules
- some "real" statistics
so the community can decide it is useful or not.



Imre







Uwe Jung  ezt írta (időpont: 2019. aug. 24., Szo,
14:55):

> Hello,
>
> As the importance of Wikidata increases, so do the demands on the quality
> of the data. I would like to put the following proposal up for discussion.
>
> Two basic ideas:
>
>1. Each Wikidata page (item) is scored after each editing. This score
>should express different dimensions of data quality in a quickly manageable
>way.
>2. A property is created via which the item refers to the score value.
>Certain qualifiers can be used for a more detailed description (e.g. time
>of calculation, algorithm used to calculate the score value, etc.).
>
>
> The score value can be calculated either within Wikibase after each data
> change or "externally" by a bot. For the calculation can be used among
> other things: Number of constraints, completeness of references, degree of
> completeness in relation to the underlying ontology, etc. There are already
> some interesting discussions on the question of data quality which can be
> used here ( see  https://www.wikidata.org/wiki/Wikidata:Item_quality;
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc).
>
> Advantages
>
>- Users get a quick overview of the quality of a page (item).
>- SPARQL can be used to query only those items that meet a certain
>quality level.
>- The idea would probably be relatively easy to implement.
>
>
> Disadvantage:
>
>- In a way, the data model is abused by generating statements that no
>longer describe the item itself, but make statements about the
>representation of this item in Wikidata.
>- Additional computing power must be provided for the regular
>calculation of all changed items.
>- Only the quality of pages is referred to. If it is insufficient, the
>changes still have to be made manually.
>
>
> I would now be interested in the following:
>
>1. Is this idea suitable to effectively help solve existing quality
>problems?
>2. Which quality dimensions should the score value represent?
>3. Which quality dimension can be calculated with reasonable effort?
>4. How to calculate and represent them?
>5. Which is the most suitable way to further discuss and implement
>this idea?
>
>
> Many thanks in advance.
>
> Uwe Jung  (UJung )
> www.archivfuehrer-kolonialzeit.de/thesaurus
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] "Wikidata item" link to be moved in the menu column on Wikimedia projects

2019-08-08 Thread Imre Samu
>Easy, my user interface is English in all of them.

thanks,
- settings the "Global preferences" solved my problem. ( it was not set in
my case )

Imre








Gerard Meijssen  ezt írta (időpont: 2019. aug.
8., Cs, 17:19):

> Hoi,
> Easy, my user interface is English in all of them.
> Thanks,
>   GerardN
>
> On Thu, 8 Aug 2019 at 16:39, Imre Samu  wrote:
>
>> *> Suggestion* display the Q number in the link i.e. the user doesnt
>> have to click the link to see the Q  number
>>
>> +1
>> And sometimes extreme hard to find the wikidata link,
>>
>> in my case:
>> - if you don't know the letters of different languages, and you want
>> cleaning the duplicated wikidata_id-s
>>
>> Quiz -  try to find the Wikidata id ( Link )  as fast as you can:
>> - https://fa.wikipedia.org/wiki/%D8%B3%DA%AF%D8%AF
>> - https://zh.wikipedia.org/wiki/%E5%A1%9E%E6%A0%BC%E5%BE%B7
>> -
>> https://th.wikipedia.org/wiki/%E0%B9%81%E0%B8%8B%E0%B9%81%E0%B8%81%E0%B9%87%E0%B8%94
>>
>> Regards,
>> Imre
>>
>> Magnus Sälgö  ezt írta (időpont: 2019. aug. 8., Cs,
>> 13:49):
>>
>>> *Suggestion* display the Q number in the link i.e. the user doesnt have
>>> to click the link to see the Q  number
>>>
>>> *Motivation:* more and more institution start use the Wikidata Qnumber
>>> and as we today display VIAF numbers, LCCN, GND numbers etc. for authority
>>> data I think we should make it easier to see that this WIkipedia article
>>> has a specific Q number
>>>
>>> Regards
>>> Magnus Sälgö
>>> Stockholm, Sweden
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] "Wikidata item" link to be moved in the menu column on Wikimedia projects

2019-08-08 Thread Imre Samu
*> Suggestion* display the Q number in the link i.e. the user doesnt have
to click the link to see the Q  number

+1
And sometimes extreme hard to find the wikidata link,

in my case:
- if you don't know the letters of different languages, and you want
cleaning the duplicated wikidata_id-s

Quiz -  try to find the Wikidata id ( Link )  as fast as you can:
- https://fa.wikipedia.org/wiki/%D8%B3%DA%AF%D8%AF
- https://zh.wikipedia.org/wiki/%E5%A1%9E%E6%A0%BC%E5%BE%B7
-
https://th.wikipedia.org/wiki/%E0%B9%81%E0%B8%8B%E0%B9%81%E0%B8%81%E0%B9%87%E0%B8%94

Regards,
Imre

Magnus Sälgö  ezt írta (időpont: 2019. aug. 8., Cs, 13:49):

> *Suggestion* display the Q number in the link i.e. the user doesnt have
> to click the link to see the Q  number
>
> *Motivation:* more and more institution start use the Wikidata Qnumber
> and as we today display VIAF numbers, LCCN, GND numbers etc. for authority
> data I think we should make it easier to see that this WIkipedia article
> has a specific Q number
>
> Regards
> Magnus Sälgö
> Stockholm, Sweden
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Python tools should use user-agent to access WDQS

2019-07-04 Thread Imre Samu
> If you're a tool builder and encountering issues with WDQS at the moment,
> please check that your tool is compliant with those guidelines.

I like and I am using the WDQS - code generator parts.( [image:
image.png] )

And as I see -   2 Python related code can be generated - but I don't see
any "user agent" in the generated codes.

Is it possible to add an example "user agent" to the generated code ?
 (  ~ default is the logged user name  or my IP address   )


from SPARQLWrapper import SPARQLWrapper, JSON
endpoint_url = "https://query.wikidata.org/sparql";
sparql_user_agent = "myWikidataUserName" # see User-Agent policy
query = """#Cats

}"""

thanks in advance,
  Imre




Léa Lacroix  ezt írta (időpont: 2019. júl. 4.,
Cs, 15:53):

> Hello all,
>
> Since we recently received several messages about tools unable to access
> the Query Service, I wanted to remind you about this email from Guillaume,
> and especially this part:
>
> "As a reminder, any bot should use a user agent that allows to identify
> it: https://meta.wikimedia.org/wiki/User-Agent_policy.";
>
> If you're a tool builder and encountering issues with WDQS at the moment,
> please check that your tool is compliant with those guidelines.
>
> Thanks for your understanding,
> Léa
>
> -- Forwarded message -
> From: Guillaume Lederrey 
> Date: Thu, 13 Jun 2019 at 19:53
> Subject: [Wikidata] Overload of query.wikidata.org
> To: Discussion list for the Wikidata project. <
> wikidata@lists.wikimedia.org>
>
>
> Hello all!
>
> We are currently dealing with a bot overloading the Wikidata Query
> Service. This bot does not look actively malicious, but does create
> enough load to disrupt the service. As a stop gap measure, we had to
> deny access to all bots using python-request user agent.
>
> As a reminder, any bot should use a user agent that allows to identify
> it [1]. If you have trouble accessing WDQS, please check that you are
> following those guidelines.
>
> More information and a proper incident report will be communicated as
> soon as we are on top of things again.
>
> Thanks for your understanding!
>
>Guillaume
>
>
> [1] https://meta.wikimedia.org/wiki/User-Agent_policy
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> --
> Léa Lacroix
> Project Manager Community Communication for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Overload of query.wikidata.org (Guillaume Lederrey)

2019-06-21 Thread Imre Samu
good news :)
-  the issue has been fixed in the 0.6.7 release - and it is working again!
https://github.com/nichtich/wikidata-taxonomy/commit/97abd4158b3c4ba9cd2c53503ca6b8b2ca29bc2a

Imre



Stas Malyshev  ezt írta (időpont: 2019. jún. 19.,
Sze, 0:22):

> Hi!
>
> On 6/18/19 2:29 PM, Tim Finin wrote:
> > I've been using wdtaxonomy
> >  happily for many months
> > on my macbook. Starting yesterday, every call I make (e.g., "wdtaxonomy
> > -c Q5") produces an immediate "SPARQL request failed" message.
>
> Could you provide more details, which query is sent and what is the full
> response (including HTTP code)?
>
> >
> > Might these requests be blocked now because of the new WDQS policies?
>
> One thing I may think of it that this tool does not send the proper
> User-Agent header. According to
> https://meta.wikimedia.org/wiki/User-Agent_policy, all clients should
> identify with valid user agent. We've started enforcing it recently, so
> maybe this tool has this issue. If not, please provide the data above.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Language codes for Chinese

2019-06-18 Thread Imre Samu
Hi Vlad,

I see the following  zh- language codes : https://w.wiki/55W

zh
zh-classical
zh-cn
zh-hans
zh-hant
zh-hk
zh-min-nan
zh-mo
zh-my
zh-sg
zh-tw
zh-yue

but the better list:
https://www.wikidata.org/wiki/Help:Wikimedia_language_codes/lists/all
 it is complicated because :  zh-yue = yue ;  zh-min-nan = nan ;
zh-classical= lzh

>But apparently there is no such values as 'zh-cn', 'zh-nans' etc.

I see "zh-cn" and "zh-hans" labels..
example:  Q2 Earth: https://www.wikidata.org/wiki/Special:EntityData/Q2.json

,

"zh-cn": {
"language": "zh-cn",
"value": "地球"
},
"zh-hans": {
"language": "zh-hans",
"value": "地球"
},
"zh-sg": {
"language": "zh-sg",
"value": "地球"
},
"zh-hk": {
"language": "zh-hk",
"value": "地球"
},
"zh-tw": {
"language": "zh-tw",
"value": "地球"
},
"zh-mo": {
"language": "zh-mo",
"value": "地球"
},

or a programmer example:

# curl -s https://www.wikidata.org/wiki/Special:EntityData/Q2.json |
jq '.' | grep language | egrep 'nan|yue|lzh|zh-'

  "language": "zh-hant",
  "language": "yue",
  "language": "zh-cn",
  "language": "zh-hans",
  "language": "zh-sg",
  "language": "zh-hk",
  "language": "zh-tw",
  "language": "zh-mo",
  "language": "nan",
  "language": "lzh",
  "language": "zh-my",
  "language": "zh-hant",
  "language": "zh-hans",
  "language": "zh-cn",
  "language": "zh-sg",
  "language": "zh-hk",
  "language": "zh-tw",
  "language": "zh-mo",
  "language": "lzh",
  "language": "yue",
  "language": "zh-my",
"language": "lzh",


Best,
Imre

Vladimir Ryabtsev  ezt írta (időpont: 2019. jún. 19.,
Sze, 2:05):

> Hello,
>
> I am looking for the list of supported language codes for variations of
> Chinese. So far in API responses I found these:
>
> zh
> zh-cn
> zh-hans
> zh-hant
> zh-hk
> zh-tw
> zh-mo
> zh-sg
>
> "Configure" link in "more languages" section leads to this page:
> https://www.wikidata.org/wiki/Help:Navigating_Wikidata/User_Options#Babel_extension
> Which in turn refers to
> https://meta.wikimedia.org/wiki/Table_of_Wikimedia_projects#Projects_per_language_codes
> But apparently there is no such values as 'zh-cn', 'zh-nans' etc.
>
> How can I get the COMPLETE list of language codes (desirably with
> description) for Chinese that is supported by Wikidata?
>
> Best regards,
> Vlad
>
> P.S. I am re-sending the message as I got a reply from the previous
> attempt saying something about moderator approval, but the message have not
> got any further since then.
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Gender statistics on the Danish Wikipedia

2019-03-05 Thread Imre Samu
> In connection with an editathon, I have made statistics of the number of
women and men on the Danish Wikipedia.

The "missing" statistics also interesting  :)
*"NobelPrize winners(by genders) -  missing from "da.wikipedia.org
"*   http://tinyurl.com/y5g9zmzp (now: *missing
female=3 ; missing male=231* )

NobelPrize gender percent wikidata : 52female/855male = 0.060 (
http://tinyurl.com/y5xnh7ch )
.da wiki : 49female/626male= *0.078 * ( http://tinyurl.com/y47sepx3 )
positive discrimination towards female Nobel Prize winners !

( some SPARQL query maybe not perfect, so please double check the numbers )

Best,
Imre



 ezt írta (időpont: 2019. márc. 5., K, 12:55):

> Dear any Wikidata Query Service expert,
>
>
> In connection with an editathon, I have made statistics of the number of
> women and men on the Danish Wikipedia. I have used WDQS for that and the
> query is listed below:
>
> SELECT ?count ?gender ?genderLabel
> WITH {
>SELECT ?gender (COUNT(*) AS ?count) WHERE {
>  ?item wdt:P31 wd:Q5 .
>  ?item wdt:P21 ?gender .
>  ?article schema:about ?item.
>  ?article schema:isPartOf 
>}
>GROUP BY ?gender
> } AS %results
> WHERE {
>INCLUDE %results
>SERVICE wikibase:label { bd:serviceParam wikibase:language "da,en". }
> }
> ORDER BY DESC(?count)
> LIMIT 25
>
> http://tinyurl.com/y8twboe5
>
> As the statistics could potentially create some discussion (and ready
> seems to have) I am wondering whether there are some experts that could
> peer review the SPARQL query and tell me if there are any issues. I hope
> I have not made a blunder...
>
> The minor issues I can think of are:
>
> - Missing gender in Wikidata. We have around 360 of these.
>
> - People on the Danish Wikipedia not on Wikidata. Probably tens-ish or
> hundreds-ish!?
>
> - People not being humans. The gendered items I sampled were all
> fictional humans.
>
>
> We previously reached 17.2% females. Now we are below 17% due to
> mass-import of Japanese football players, - as far as we can see.
>
>
> best regards
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Best way to query Wikidata

2019-01-24 Thread Imre Samu
>Is there a JAVA API to the Wikidata Query Service interface?

- go https://query.wikidata.org
- select any example ( cats? )   +  "Execute query"
- select the *" Code"*- and select *"Java" *-> and check the
generated Java code

> Can SPARQL be used to query a dump of Wikidata?

maybe you can setup your own SPARQL service ( check http://wikiba.se/  )

> How to best make such queries (information search) in terms of query
response time?

for my use case - ( Geodata matching ) I have loaded the data to my
database
* I have downloaded  wikidata JSON dump (
  *
https://www.wikidata.org/wiki/Wikidata:Database_download#JSON_dumps_(recommended)

  * https://dumps.wikimedia.org/wikidatawiki/entities/
* Filtered the JSON
* And loaded only the relevant elements to the Postgres/PostGIS database.

or you can load to CouchDB
https://github.com/maxlath/import-wikidata-dump-to-couchdb
( or you can check the other tools here:
https://www.wikidata.org/wiki/Wikidata:Tools/For_programmers )

but my use case is very special.


Best,
 Imre







Raveesh Meena  ezt írta (időpont: 2019. jan. 24., Cs,
9:17):

> Hi,
>
> I have a set of specific querie that my programme shoukd make to Wikidata.
> How to best make such queries (information search) in terms of query
> response time?
>
> One approach that I am familar is to use APIs, such as Wikidata Toolkit
> API , to query documents
> in Wiki and parse each document (and perhaps follow hyperlinks) for
> obtaining the specific feature values. I have tested this, but the queries
> could take seconds. Now there could be many factors responsible for this
> such as internet connection, the scheme for parsing wikidata graph, etc.
>
> Another approach would be to use the Wikidata Query Service
>  and SPARQL queries. This is also slow
> sometimes.
>
> My questions:
>
>- Are there benchmarks on response time performance of Wikidata Query
>Service  SPARQL queries?
>- Is there a JAVA API to the Wikidata Query Service
> interface?
>- Can SPARQL be used to query a dump of Wikidata?
>
> Thanks and regards
> Raveesh
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Company data and properties

2019-01-23 Thread Imre Samu
Hi Darren,

OpenStreetMap Started linking company data with wikidata  (
https://github.com/osmlab/name-suggestion-index )
( https://wiki.openstreetmap.org/wiki/Key:brand:wikidata )
current status:  https://taginfo.openstreetmap.org/keys/brand:wikidata  ( ~
6 objects )
the project is still in its early stages,   Lot of issues & data modeling
problems.

best,
 Imre




Darren Cook  ezt írta (időpont: 2019. jan. 23., Sze,
11:07):

> I wondered if anyone else is actively working on fleshing out the
> company data within Wikidata?
>
> Is
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Companies/Properties
> the best reference guide?
>
> Looking to chat, on- or off-list, with interested people, hear about
> academic research being done, challenges you are facing, etc. Even set
> up a meetup if there is anyone in or near London.
>
> As background we're organizing (paying) some people to improve the
> quantity and quality in a couple of areas (*), with the ulterior motive
> of reaching a critical mass, so the information becomes useful and other
> users or even the companies themselves start maintaining it.
>
> Some of the above page is still vague, and the examples are simple. We
> want to be able to cope with complex business structures, joint-venture
> child companies, companies in multiple industries, with many brands and
> products, etc.
>
> Darren
>
>
> *: Not coincidentally, areas that are useful to a couple of our clients.
> "We" is my company, QQ Trend Ltd., a small data/AI company.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikibase as a decentralized perspective for Wikidata

2018-11-29 Thread Imre Samu
> More specifically, at OSM that's the only Q-numbers people are aware of.

I would like to share my use case  ( sorry if sometimes is offtopic )

I am:
- member of Wikimédia Magyarország Egyesület (Wikimedia Hungary)
- OSM  meetup organizer
- in my mind:'Q' == Wikidata ;   'Q'  == Quality  ( but this is a
false associations )
- I have experience working with data warehousing / relational databases

Q/P prefix for me like a https://en.wikipedia.org/wiki/Hungarian_notation

* "Hungarian notation aims to remedy this by providing the programmer with
explicit knowledge of each variable's data type."*
but now I am not sure:
- What is the real meaning of Q/P prefix  ->  Wikidata or Wikibase?


I am involved in some open geodata projects.
#1. adding Wikidata ID concordances to Natural Earth ( this is my work )

https://www.naturalearthdata.com/blog/miscellaneous/natural-earth-v4-1-0-release-notes/
#2. adding Wikidata ID concordances to https://whosonfirst.org/ ( Who's On
First is a gazetteer of places. )
#3. OSM

First time:  I tried SPARQL + Wikidata Query Service
My experience:
- more and more data -> ( like: Q486972, human settlement )  -> more
timeouts  ( in my complex geo queries )
  (a lot of farms imported in the Netherlands area, so I have to limit the
search radius;...   )
- data changes every time, so hard to write and validate complex program
codes.
After a few months, I have learned that for heavy data users the  Wikidata
Query Service sometimes not perfect. ( but good for light queries ! )

So now I am loading "Wikidata JSON dump"  to Postgres/PostGIS database -
and I am writing complex codes in SQL
My codes are very complex codes ( jaro_winkler distance, geo distance,
detecting Cebuno imports ; ranking multiple candidates for matching ) ;
And finally I can control the performance of the system  ( not timeout
) and I have reproducible results.

for example:  my simple SQL example code  - you can see lot of P/Q codes
inside ,
and you can expect -  now I am know lot of Q/P codes by heart !
select
wd_id
,wd_label
,get_wdcqv_globecoordinate(data,'P625','P518','Q1233637') as river_mouth
,get_wdcqv_globecoordinate(data,'P625','P518','Q7376362') as river_source
from wd.wdx
where wd_id='Q626';


And now the  "Natural Earth" tables  looks like this  ( relational database
)
+-++---+
|name | wikidataid | iata_code |
+-++---+
| Birsa Munda | Q598231| IXR   |
| Barnaul | Q1858312   | BAX   |
| Bareilly| Q2788745   |   |

this is my current workflow.

But my real nightmare will start - if other databases start using Q/P
prefix:
for example, other Airport related databases start using Wikibase - with Q
codes
-  http://ourairports.com/   ;
-  https://www.flightradar24.com/data/airports
-  https://www.airnav.com/airports/

So every airport have at least  4 different Q codes!
And in the future, I have to check errors in this spreadsheet ( and
sometimes I don't see the header )
+-++---+-+---+---+
|name | wikidataid | iata_code | ourairports | flightR24 | AirNav
  |
+-++---+-+---+---+
| Birsa Munda | Q598231| IXR   |  Q325324| Q973  | Q1
  |
| Barnaul | Q1858312   | BAX   |  Q42| Q1| Q8312
 |
| Bareilly| Q2788745   |   |  Q1 | Q31   | Q45
 |

Q1 - everywhere - with different meanings

And what if some users want to add the new airport ID-s  back to the
wikidata (  linking databases )  Why not
so in the future, If I check the https://www.wikidata.org/wiki/Q598231
I will see a lot of different Q codes:
  OurairportsQ325324
 FlightR24   Q973
 AirNav  Q1

And sometimes very hard to communicate for the new contributors that
Q1(AirNav) =/= Q1(Wikidata)

If I see any database/spreadsheet.
- and I see a Q code - My current expectations that this is a Wikidata
code.   :)
Just check:  https://github.com/search?q=Q28+hungary&type=Code

So my current opinion:
- please don't use Q/P prefixes in any new/other databases!

for me, unlearning a lot of Q/P values is hard,
so as I have more-and-more experience in Wikidata data model - I would like
less-and-less using any other Wikibase systems with similar Q/P prefixes.


My other pain point is the "Wikidata JSON dump" ,  a little more
information would be a big help for me:

for detecting data quality of items:
- last modification DateTime
- last modification user type ( anonym_user,  new_user,  experienced_user,
bot )
- edit counts by user type , for example:  { anonym_user=2 ,  new_user=0 ,
experienced_user=0,  bot=15 }
Info about wikidata life cycle
- Wikidata redirections / deletions   (  now: only in the .ttl files )


I know - I am not a typical user ...  and my problems, not a priority yet,

imho:

Integrating Wikidata iDs to other databases have already started ( OSM,
Natur

Re: [Wikidata] WikiData use and users

2018-06-01 Thread Imre Samu
> make use of WikiData to power their projects

Some (geo) related projects:
- NaturalEarth   : Free vector and raster map data
https://www.naturalearthdata.com/blog/miscellaneous/natural-earth-v4-1-0-release-notes/
- Who's On First : is a gazetteer of places : https://www.whosonfirst.org/
; wikidata concordances  )
- OpenStreetMap  : https://wiki.openstreetmap.org/wiki/Key:wikidata
- Mapbox(Startup): https://www.mapbox.com/blog/tags/wikidata/
- ...

And a lot of Github projects:
https://github.com/search?q=wikidata&type=Topics

Imre




2018-06-01 8:38 GMT+02:00 Heather Ford :

> Hi there,
>
> I'm doing some research on WikiData and wondering whether there is a list
> of projects/sites that either a) make use of WikiData to power their
> projects or that b) WikiData extracts data from in order to populate items.
> I can see some projects listed in External Tools [1] but can't seem to find
> lists of projects beyond this.
>
> Can anyone help?
>
> Many thanks.
>
> Best,
> Heather.
>
> [1] https://www.wikidata.org/wiki/Wikidata:Tools/External_tools
>
> Dr Heather Ford
> Senior Lecturer, School of Arts & Media ,
> University of New South Wales
> w: hblog.org / EthnographyMatters.net  /
> t: @hfordsa 
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] RDF: All vs Truthy

2017-12-03 Thread Imre Samu
> « Ranking » is a Wikibase feature to deal with this. If one of the
statement is ranked « preferred »,
> typically the one valid at present time, then it will be the only one
present in a typical query result or in an infobox extraction.

Thank you  :)

one more question:
As a human:   how can I check  -  for example   Russia -"Truthy=simple"
statements? Is there any website?

The  https://www.wikidata.org/wiki/Q159 links show all statements, I
want to see only the  "Truthy=simple"  statements.
What is the best practice - for debugging?

Thanks in advance,
  Imre



2017-12-03 15:10 GMT+01:00 Thomas Douillard :

> « Ranking » is a Wikibase feature to deal with this. If one of the
> statement is ranked « preferred », typically the one valid at present time,
> then it will be the only one present in a typical query result or in an
> infobox extraction.
>
> 2017-12-03 14:49 GMT+01:00 Imre Samu :
>
>> >All=contains not only the Truthy ones, but also the ones with qualifiers
>>
>> imho:  Sometimes Qualifiers is very important for multiple values  (
>>  like "Start time","End time","point in time", ... )
>> for example:   Russia https://www.wikidata.org/wiki/Q159  :  Russia -
>> P38:"currency"
>> has 2 "statements" both with qualifiers:
>>
>> * Russian ruble -  ( start time: 1992 )
>> * Soviet ruble  - (end time: September 1993 )
>>
>> My Question:
>> in this case - what is the "Truthy=simple" result for
>>  Russia-P38:"currency" ?
>>
>>
>> Regards,
>>  Imre
>>
>>
>>
>> 2017-12-03 7:54 GMT+01:00 Fariz Darari :
>>
>>> Truthy=simple, direct, only Subject-Predicate-Object structure
>>>
>>> For example: wd:Q76127 wdt:P26 wd:Q468519 (= Sukarno hasSpouse Fatmawati)
>>>
>>> All=contains not only the Truthy ones, but also the ones with qualifiers
>>> (= how long was the marriage? when did the marriage happen?), references
>>> (sources to support the claim), and preferences (in case of multiple
>>> values, one might be preferred -- think of multiple birth dates of some
>>> people).
>>>
>>> -fariz
>>>
>>> Regards,
>>> Fariz
>>>
>>> On Sun, Dec 3, 2017 at 1:49 PM, Laura Morales  wrote:
>>>
>>>> Can somebody please explain (in simple terms) what's the difference
>>>> between "all" and "truthy" RDF dumps? I've read the explanation available
>>>> on the wiki [1] but I still don't get it.
>>>> If I'm just a user of the data, because I want to retrieve information
>>>> about a particular item and link items with other graphs... what am I
>>>> missing/leaving-out by using "truthy" instead of "all"?
>>>> A practical example would be appreciated since it will clarify things,
>>>> I suppose.
>>>>
>>>> [1] https://www.wikidata.org/wiki/Wikidata:Database_download#RDF_dumps
>>>>
>>>> ___
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] RDF: All vs Truthy

2017-12-03 Thread Imre Samu
>All=contains not only the Truthy ones, but also the ones with qualifiers

imho:  Sometimes Qualifiers is very important for multiple values  (   like
"Start time","End time","point in time", ... )
for example:   Russia https://www.wikidata.org/wiki/Q159  :  Russia -
P38:"currency"
has 2 "statements" both with qualifiers:

* Russian ruble -  ( start time: 1992 )
* Soviet ruble  - (end time: September 1993 )

My Question:
in this case - what is the "Truthy=simple" result for
 Russia-P38:"currency" ?


Regards,
 Imre



2017-12-03 7:54 GMT+01:00 Fariz Darari :

> Truthy=simple, direct, only Subject-Predicate-Object structure
>
> For example: wd:Q76127 wdt:P26 wd:Q468519 (= Sukarno hasSpouse Fatmawati)
>
> All=contains not only the Truthy ones, but also the ones with qualifiers
> (= how long was the marriage? when did the marriage happen?), references
> (sources to support the claim), and preferences (in case of multiple
> values, one might be preferred -- think of multiple birth dates of some
> people).
>
> -fariz
>
> Regards,
> Fariz
>
> On Sun, Dec 3, 2017 at 1:49 PM, Laura Morales  wrote:
>
>> Can somebody please explain (in simple terms) what's the difference
>> between "all" and "truthy" RDF dumps? I've read the explanation available
>> on the wiki [1] but I still don't get it.
>> If I'm just a user of the data, because I want to retrieve information
>> about a particular item and link items with other graphs... what am I
>> missing/leaving-out by using "truthy" instead of "all"?
>> A practical example would be appreciated since it will clarify things, I
>> suppose.
>>
>> [1] https://www.wikidata.org/wiki/Wikidata:Database_download#RDF_dumps
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata