Re: [Toolserver-l] Wikidata tables

2013-04-22 Thread Daniel Schwen
 FWIW, I have implemented a query-able stand-alone web server that keeps all
 of the wikidata property-item-links in memory. This uses the wikidata dumps

That does not sound too terribly scalable. I did the same thing
(custom webserver, data kept in memory) for the map labels of my
WikiMiniAtlas, and had to abandon this in order to be able to support
more languages. And my data is only a subset of what I expect to show
up in WikiData.
On top of that not being able to join data from there with other DBs
is a serious deficiency. Same goes for the suggestion to just use the
API.
Daniel

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Platonides
On 19/04/13 01:19, DaB. wrote:
 as you may know there is a rev_text_id-field in the revision-table. This 
 field 
 points to the text-table where the actual text is – or should be. Because the 
 WMF doesn’t store the text here, but only a pointer 
 (DB://cluster25/11458305 
 for example). If you query different wikis you will see that most of them 
 point 
 to the same cluster or one with a number short by. That says me (and I was 
 also told so before) that all text of all wmf-projects are stored together.
 The task would now to separate wikidata from the rest – but the storage-area 
 has no clue from where a text is which makes the separating very hard. And 
 there is another problem: Deleted texts are also in this area, so even more 
 filtering would be needed.
 I very doubt that this situation will change at the TS and I also doubt that 
 it will be different for WikiLabs. So I guess your best bet is the API here.
 
 Sincerely,
 DaB.

I think the only hope would be if wikidata was stored under its own
cluster (for easier differenciation) and at least one server of that
group (the master?) only had that (so toolserver could get its binlogs).

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Magnus Manske
FWIW, I have implemented a query-able stand-alone web server that keeps all
of the wikidata property-item-links in memory. This uses the wikidata dumps
which appear to be rather frequent. I'll try do deploy a test version on
wikilabs (once I figure out how all that works); it seems to be more
favourable to such services than the toolserver.


On Fri, Apr 19, 2013 at 9:29 AM, Platonides platoni...@gmail.com wrote:

 On 19/04/13 01:19, DaB. wrote:
  as you may know there is a rev_text_id-field in the revision-table. This
 field
  points to the text-table where the actual text is – or should be.
 Because the
  WMF doesn’t store the text here, but only a pointer
 (DB://cluster25/11458305
  for example). If you query different wikis you will see that most of
 them point
  to the same cluster or one with a number short by. That says me (and I
 was
  also told so before) that all text of all wmf-projects are stored
 together.
  The task would now to separate wikidata from the rest – but the
 storage-area
  has no clue from where a text is which makes the separating very hard.
 And
  there is another problem: Deleted texts are also in this area, so even
 more
  filtering would be needed.
  I very doubt that this situation will change at the TS and I also doubt
 that
  it will be different for WikiLabs. So I guess your best bet is the API
 here.
 
  Sincerely,
  DaB.

 I think the only hope would be if wikidata was stored under its own
 cluster (for easier differenciation) and at least one server of that
 group (the master?) only had that (so toolserver could get its binlogs).

 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Kolossos

Hello,
Postrgesql starts now to support also JSON, so we should try to find a 
way to bring Wikidata available for us and I would prefer to use 
furthermore SQL.


One way could be minutely diff-files, that's the way OpenStreetMap use.
Alternatively we could use API for each updated article.

Every central service is better than let's fighting everyone with the 
problem alone.


I think the support of hierarchical informations was the key to use JSON 
instead of a key-value store. A point that I can understand.


Greetings Tim


Am 19.04.2013 00:43, schrieb Daniel Schwen:

if these JSON-data is stored where the normal wiki-text is, it is imposable

To my understanding it is.


for us to replicate it: Because we have no access to these wmf-servers, there

IMO that was a questionable design decision. JSON plaintext storage in
SQL is shoehorning a do-it-yourself object store onto a classical
RDBMS.
Postgres at least has hstore. This may be even a genuine usecase for
one of those hipster databases (noSQL like mongdb etc.). But who knows
what points were taken into consideration when making this decision.


would be no way to separate Wikidata from the rest

I don't understand why separating plaintext storage between different
projects would be an issue. Is it all lumped into one storage
namespace?
I'm sure nobody at Wikimedia would be the least bit motivated to make
this data available to the toolserver, but maybe it will be usable in
labs. Otherwise it would be quite a waste of a great opportunity.


and/or we have not enough disc-space.

If you can separate it out I seriously doubt that wikidata would
require storage any where near as large in magnitude as the other
wikimedia projects (at least in the mid-term)

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette





___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Kolossos
Sounds fine, but will it be possible to join the data with data from 
other tables and other projects? This joins are the base for a lot of 
tools on toolserver and I#m not sure how good joins on application level 
will work.


BTW: With the project Templatetiger we already handle tons of 
informations of infoboxes in MYSQL on Toolserver, all these data are 
highly redundant because we support a lot of languages. So there should 
be enough space on toolserver in midterm. But I also think that labs are 
the right place to start such a project.


Greetings Tim

Am 19.04.2013 10:37, schrieb Magnus Manske:

FWIW, I have implemented a query-able stand-alone web server that keeps
all of the wikidata property-item-links in memory. This uses the
wikidata dumps which appear to be rather frequent. I'll try do deploy a
test version on wikilabs (once I figure out how all that works); it
seems to be more favourable to such services than the toolserver.




___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Platonides
Sorry, but your mail doesn't make much sense.

El 19/04/13 11:14, Patricia Pintilie escribió:
 Ok so how about we recocnize what the overal goal is first . Then
 establish the point that its trying to convay. Only then can we meet in
 the middle and set a plan in motion. I can only assist when a plan of
 action is clear with a definite plan without it im lost on where to
 begin. It seems as if im doing my natural instinct research then I get
 mail from the people im reading about... Very interesting this is
 because im left to think my mind is linked to the problems at hand. TS
 is my old signature, my server os will pull up my IP searches. Which
 leads me to believe this is why I am always being brought up in the
 middle of these outstanding conversations you guys are having lol.
 Please send detailed instructions as to how I can help,there should be a
 file known as Mila.eu also known as ro.eula. Find it and run whatever it
 has, thanks.
 -patiently waiting your responce.
 -MilaStarX-TS

TS was being used in this thread as an abbreviature of ToolServer. This
mailing list is about the Wikimedia Toolserver, so no wonder that it's
mentioned a lot here. :)
I don't know if you have a toolserver account, or even if you're a
wikimedian. I don't know what you refer to with “my server os will pull
up my IP searches” nor where is that “file known as Mila.eu also known
as ro.eula” supposed to be.
If you were asking for someone to make a query for you in the
toolserver, try including the request in the email, or at least a link
to what you want.

Best regards

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Patricia Pintilie
Waterfall the Anamorphic Development.
On Apr 19, 2013 4:14 AM, Patricia Pintilie pintilieemp...@gmail.com
wrote:

 Ok so how about we recocnize what the overal goal is first . Then
 establish the point that its trying to convay. Only then can we meet in the
 middle and set a plan in motion. I can only assist when a plan of action is
 clear with a definite plan without it im lost on where to begin. It seems
 as if im doing my natural instinct research then I get mail from the people
 im reading about... Very interesting this is because im left to think my
 mind is linked to the problems at hand. TS is my old signature, my server
 os will pull up my IP searches. Which leads me to believe this is why I am
 always being brought up in the middle of these outstanding conversations
 you guys are having lol. Please send detailed instructions as to how I can
 help,there should be a file known as Mila.eu also known as ro.eula. Find it
 and run whatever it has, thanks.
 -patiently waiting your responce.
 -MilaStarX-TS
 On Apr 19, 2013 3:29 AM, Platonides platoni...@gmail.com wrote:

 On 19/04/13 01:19, DaB. wrote:
  as you may know there is a rev_text_id-field in the revision-table.
 This field
  points to the text-table where the actual text is – or should be.
 Because the
  WMF doesn’t store the text here, but only a pointer
 (DB://cluster25/11458305
  for example). If you query different wikis you will see that most of
 them point
  to the same cluster or one with a number short by. That says me (and I
 was
  also told so before) that all text of all wmf-projects are stored
 together.
  The task would now to separate wikidata from the rest – but the
 storage-area
  has no clue from where a text is which makes the separating very hard.
 And
  there is another problem: Deleted texts are also in this area, so even
 more
  filtering would be needed.
  I very doubt that this situation will change at the TS and I also doubt
 that
  it will be different for WikiLabs. So I guess your best bet is the API
 here.
 
  Sincerely,
  DaB.

 I think the only hope would be if wikidata was stored under its own
 cluster (for easier differenciation) and at least one server of that
 group (the master?) only had that (so toolserver could get its binlogs).

 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] Wikidata tables

2013-04-18 Thread Magnus Manske
Just wondering what the status of exposing all wikidata tables on the
toolserver is.

Currently, there are a few wb_* tables with item labels, descriptions,
aliases, and language links.

But the tables (whatever they are called) containing item-to-item
connections appear to be missing. Maybe because they were added later?

Magnus
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-18 Thread Lydia Pintscher
On Thu, Apr 18, 2013 at 9:52 AM, Magnus Manske
magnusman...@googlemail.com wrote:
 Just wondering what the status of exposing all wikidata tables on the
 toolserver is.

 Currently, there are a few wb_* tables with item labels, descriptions,
 aliases, and language links.

 But the tables (whatever they are called) containing item-to-item
 connections appear to be missing. Maybe because they were added later?

As far as I know they're only saved in JSON where usually the article
text is stored and not in separate tables.


Cheers
Lydia

--
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Wikidata

Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-18 Thread Magnus Manske
Huh. That could be ... problematic in the future.

Thanks,
Magnus


On Thu, Apr 18, 2013 at 10:21 AM, Lydia Pintscher 
lydia.pintsc...@wikimedia.de wrote:

 On Thu, Apr 18, 2013 at 9:52 AM, Magnus Manske
 magnusman...@googlemail.com wrote:
  Just wondering what the status of exposing all wikidata tables on the
  toolserver is.
 
  Currently, there are a few wb_* tables with item labels, descriptions,
  aliases, and language links.
 
  But the tables (whatever they are called) containing item-to-item
  connections appear to be missing. Maybe because they were added later?

 As far as I know they're only saved in JSON where usually the article
 text is stored and not in separate tables.


 Cheers
 Lydia

 --
 Lydia Pintscher - http://about.me/lydia.pintscher
 Community Communications for Wikidata

 Wikimedia Deutschland e.V.
 Obentrautstr. 72
 10963 Berlin
 www.wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-18 Thread Patricia Pintilie
Imagine Magnus all the files are being looked over to determine human,
machine, and non used accounts. Each needs to be looked over then proper
deletion will clean out making more room. Problematic yes but will help if
this is taken care of the right way now.
On Apr 18, 2013 5:45 AM, Magnus Manske magnusman...@googlemail.com
wrote:

 Huh. That could be ... problematic in the future.

 Thanks,
 Magnus


 On Thu, Apr 18, 2013 at 10:21 AM, Lydia Pintscher 
 lydia.pintsc...@wikimedia.de wrote:

 On Thu, Apr 18, 2013 at 9:52 AM, Magnus Manske
 magnusman...@googlemail.com wrote:
  Just wondering what the status of exposing all wikidata tables on the
  toolserver is.
 
  Currently, there are a few wb_* tables with item labels, descriptions,
  aliases, and language links.
 
  But the tables (whatever they are called) containing item-to-item
  connections appear to be missing. Maybe because they were added later?

 As far as I know they're only saved in JSON where usually the article
 text is stored and not in separate tables.


 Cheers
 Lydia

 --
 Lydia Pintscher - http://about.me/lydia.pintscher
 Community Communications for Wikidata

 Wikimedia Deutschland e.V.
 Obentrautstr. 72
 10963 Berlin
 www.wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

 Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
 unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
 Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette



 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-18 Thread Byrial Jensen

Den 18-04-2013 11:21, Lydia Pintscher skrev:

On Thu, Apr 18, 2013 at 9:52 AM, Magnus Manske
magnusman...@googlemail.com wrote:

Just wondering what the status of exposing all wikidata tables on the
toolserver is.

Currently, there are a few wb_* tables with item labels, descriptions,
aliases, and language links.

But the tables (whatever they are called) containing item-to-item
connections appear to be missing. Maybe because they were added later?


As far as I know they're only saved in JSON where usually the article
text is stored and not in separate tables.


You can see in the pagelinks table which properties and which items an 
item is connected to by statements, but not how the properties and items 
are paired together.


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-18 Thread Daniel Schwen
I can (barely) understand why wikitext is not available on the
toolserver, but the JSON data you are talking about does not seem
copyrightable and much lower in volume. The usefulness of the wikidata
mirror on the toolserver is rather low without the actual wikiDATA.
Daniel

On Thu, Apr 18, 2013 at 8:57 AM, Byrial Jensen byr...@vip.cybercity.dk wrote:
 Den 18-04-2013 11:21, Lydia Pintscher skrev:

 On Thu, Apr 18, 2013 at 9:52 AM, Magnus Manske
 magnusman...@googlemail.com wrote:

 Just wondering what the status of exposing all wikidata tables on the
 toolserver is.

 Currently, there are a few wb_* tables with item labels, descriptions,
 aliases, and language links.

 But the tables (whatever they are called) containing item-to-item
 connections appear to be missing. Maybe because they were added later?


 As far as I know they're only saved in JSON where usually the article
 text is stored and not in separate tables.


 You can see in the pagelinks table which properties and which items an item
 is connected to by statements, but not how the properties and items are
 paired together.


 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-18 Thread DaB.
Hello,
At Thursday 18 April 2013 22:00:54 DaB. wrote:
  but the JSON data you are talking about does not seem
 copyrightable and much lower in volume.

if these JSON-data is stored where the normal wiki-text is, it is imposable 
for us to replicate it: Because we have no access to these wmf-servers, there 
would be no way to separate Wikidata from the rest and/or we have not enough 
disc-space. 

Sincerely,
DaB.


-- 
Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885


signature.asc
Description: This is a digitally signed message part.
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-18 Thread Daniel Schwen
 if these JSON-data is stored where the normal wiki-text is, it is imposable
To my understanding it is.

 for us to replicate it: Because we have no access to these wmf-servers, there
IMO that was a questionable design decision. JSON plaintext storage in
SQL is shoehorning a do-it-yourself object store onto a classical
RDBMS.
Postgres at least has hstore. This may be even a genuine usecase for
one of those hipster databases (noSQL like mongdb etc.). But who knows
what points were taken into consideration when making this decision.

 would be no way to separate Wikidata from the rest
I don't understand why separating plaintext storage between different
projects would be an issue. Is it all lumped into one storage
namespace?
I'm sure nobody at Wikimedia would be the least bit motivated to make
this data available to the toolserver, but maybe it will be usable in
labs. Otherwise it would be quite a waste of a great opportunity.

 and/or we have not enough disc-space.
If you can separate it out I seriously doubt that wikidata would
require storage any where near as large in magnitude as the other
wikimedia projects (at least in the mid-term)

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-18 Thread DaB.
Hello,
At Friday 19 April 2013 01:03:25 DaB. wrote:
  would be no way to separate Wikidata from the rest
 
 I don't understand why separating plaintext storage between different
 projects would be an issue. Is it all lumped into one storage
 namespace?
 I'm sure nobody at Wikimedia would be the least bit motivated to make
 this data available to the toolserver, but maybe it will be usable in
 labs. Otherwise it would be quite a waste of a great opportunity.

as you may know there is a rev_text_id-field in the revision-table. This field 
points to the text-table where the actual text is – or should be. Because the 
WMF doesn’t store the text here, but only a pointer (DB://cluster25/11458305 
for example). If you query different wikis you will see that most of them point 
to the same cluster or one with a number short by. That says me (and I was 
also told so before) that all text of all wmf-projects are stored together.
The task would now to separate wikidata from the rest – but the storage-area 
has no clue from where a text is which makes the separating very hard. And 
there is another problem: Deleted texts are also in this area, so even more 
filtering would be needed.
I very doubt that this situation will change at the TS and I also doubt that 
it will be different for WikiLabs. So I guess your best bet is the API here.

Sincerely,
DaB.

-- 
Userpage: [[:w:de:User:DaB.]] — PGP: 0x2d3ee2d42b255885


signature.asc
Description: This is a digitally signed message part.
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette