Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Platonides
On 19/04/13 01:19, DaB. wrote:
 as you may know there is a rev_text_id-field in the revision-table. This 
 field 
 points to the text-table where the actual text is – or should be. Because the 
 WMF doesn’t store the text here, but only a pointer 
 (DB://cluster25/11458305 
 for example). If you query different wikis you will see that most of them 
 point 
 to the same cluster or one with a number short by. That says me (and I was 
 also told so before) that all text of all wmf-projects are stored together.
 The task would now to separate wikidata from the rest – but the storage-area 
 has no clue from where a text is which makes the separating very hard. And 
 there is another problem: Deleted texts are also in this area, so even more 
 filtering would be needed.
 I very doubt that this situation will change at the TS and I also doubt that 
 it will be different for WikiLabs. So I guess your best bet is the API here.
 
 Sincerely,
 DaB.

I think the only hope would be if wikidata was stored under its own
cluster (for easier differenciation) and at least one server of that
group (the master?) only had that (so toolserver could get its binlogs).

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Magnus Manske
FWIW, I have implemented a query-able stand-alone web server that keeps all
of the wikidata property-item-links in memory. This uses the wikidata dumps
which appear to be rather frequent. I'll try do deploy a test version on
wikilabs (once I figure out how all that works); it seems to be more
favourable to such services than the toolserver.


On Fri, Apr 19, 2013 at 9:29 AM, Platonides platoni...@gmail.com wrote:

 On 19/04/13 01:19, DaB. wrote:
  as you may know there is a rev_text_id-field in the revision-table. This
 field
  points to the text-table where the actual text is – or should be.
 Because the
  WMF doesn’t store the text here, but only a pointer
 (DB://cluster25/11458305
  for example). If you query different wikis you will see that most of
 them point
  to the same cluster or one with a number short by. That says me (and I
 was
  also told so before) that all text of all wmf-projects are stored
 together.
  The task would now to separate wikidata from the rest – but the
 storage-area
  has no clue from where a text is which makes the separating very hard.
 And
  there is another problem: Deleted texts are also in this area, so even
 more
  filtering would be needed.
  I very doubt that this situation will change at the TS and I also doubt
 that
  it will be different for WikiLabs. So I guess your best bet is the API
 here.
 
  Sincerely,
  DaB.

 I think the only hope would be if wikidata was stored under its own
 cluster (for easier differenciation) and at least one server of that
 group (the master?) only had that (so toolserver could get its binlogs).

 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Kolossos

Hello,
Postrgesql starts now to support also JSON, so we should try to find a 
way to bring Wikidata available for us and I would prefer to use 
furthermore SQL.


One way could be minutely diff-files, that's the way OpenStreetMap use.
Alternatively we could use API for each updated article.

Every central service is better than let's fighting everyone with the 
problem alone.


I think the support of hierarchical informations was the key to use JSON 
instead of a key-value store. A point that I can understand.


Greetings Tim


Am 19.04.2013 00:43, schrieb Daniel Schwen:

if these JSON-data is stored where the normal wiki-text is, it is imposable

To my understanding it is.


for us to replicate it: Because we have no access to these wmf-servers, there

IMO that was a questionable design decision. JSON plaintext storage in
SQL is shoehorning a do-it-yourself object store onto a classical
RDBMS.
Postgres at least has hstore. This may be even a genuine usecase for
one of those hipster databases (noSQL like mongdb etc.). But who knows
what points were taken into consideration when making this decision.


would be no way to separate Wikidata from the rest

I don't understand why separating plaintext storage between different
projects would be an issue. Is it all lumped into one storage
namespace?
I'm sure nobody at Wikimedia would be the least bit motivated to make
this data available to the toolserver, but maybe it will be usable in
labs. Otherwise it would be quite a waste of a great opportunity.


and/or we have not enough disc-space.

If you can separate it out I seriously doubt that wikidata would
require storage any where near as large in magnitude as the other
wikimedia projects (at least in the mid-term)

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette





___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Kolossos
Sounds fine, but will it be possible to join the data with data from 
other tables and other projects? This joins are the base for a lot of 
tools on toolserver and I#m not sure how good joins on application level 
will work.


BTW: With the project Templatetiger we already handle tons of 
informations of infoboxes in MYSQL on Toolserver, all these data are 
highly redundant because we support a lot of languages. So there should 
be enough space on toolserver in midterm. But I also think that labs are 
the right place to start such a project.


Greetings Tim

Am 19.04.2013 10:37, schrieb Magnus Manske:

FWIW, I have implemented a query-able stand-alone web server that keeps
all of the wikidata property-item-links in memory. This uses the
wikidata dumps which appear to be rather frequent. I'll try do deploy a
test version on wikilabs (once I figure out how all that works); it
seems to be more favourable to such services than the toolserver.




___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Platonides
Sorry, but your mail doesn't make much sense.

El 19/04/13 11:14, Patricia Pintilie escribió:
 Ok so how about we recocnize what the overal goal is first . Then
 establish the point that its trying to convay. Only then can we meet in
 the middle and set a plan in motion. I can only assist when a plan of
 action is clear with a definite plan without it im lost on where to
 begin. It seems as if im doing my natural instinct research then I get
 mail from the people im reading about... Very interesting this is
 because im left to think my mind is linked to the problems at hand. TS
 is my old signature, my server os will pull up my IP searches. Which
 leads me to believe this is why I am always being brought up in the
 middle of these outstanding conversations you guys are having lol.
 Please send detailed instructions as to how I can help,there should be a
 file known as Mila.eu also known as ro.eula. Find it and run whatever it
 has, thanks.
 -patiently waiting your responce.
 -MilaStarX-TS

TS was being used in this thread as an abbreviature of ToolServer. This
mailing list is about the Wikimedia Toolserver, so no wonder that it's
mentioned a lot here. :)
I don't know if you have a toolserver account, or even if you're a
wikimedian. I don't know what you refer to with “my server os will pull
up my IP searches” nor where is that “file known as Mila.eu also known
as ro.eula” supposed to be.
If you were asking for someone to make a query for you in the
toolserver, try including the request in the email, or at least a link
to what you want.

Best regards

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Wikidata tables

2013-04-19 Thread Patricia Pintilie
Waterfall the Anamorphic Development.
On Apr 19, 2013 4:14 AM, Patricia Pintilie pintilieemp...@gmail.com
wrote:

 Ok so how about we recocnize what the overal goal is first . Then
 establish the point that its trying to convay. Only then can we meet in the
 middle and set a plan in motion. I can only assist when a plan of action is
 clear with a definite plan without it im lost on where to begin. It seems
 as if im doing my natural instinct research then I get mail from the people
 im reading about... Very interesting this is because im left to think my
 mind is linked to the problems at hand. TS is my old signature, my server
 os will pull up my IP searches. Which leads me to believe this is why I am
 always being brought up in the middle of these outstanding conversations
 you guys are having lol. Please send detailed instructions as to how I can
 help,there should be a file known as Mila.eu also known as ro.eula. Find it
 and run whatever it has, thanks.
 -patiently waiting your responce.
 -MilaStarX-TS
 On Apr 19, 2013 3:29 AM, Platonides platoni...@gmail.com wrote:

 On 19/04/13 01:19, DaB. wrote:
  as you may know there is a rev_text_id-field in the revision-table.
 This field
  points to the text-table where the actual text is – or should be.
 Because the
  WMF doesn’t store the text here, but only a pointer
 (DB://cluster25/11458305
  for example). If you query different wikis you will see that most of
 them point
  to the same cluster or one with a number short by. That says me (and I
 was
  also told so before) that all text of all wmf-projects are stored
 together.
  The task would now to separate wikidata from the rest – but the
 storage-area
  has no clue from where a text is which makes the separating very hard.
 And
  there is another problem: Deleted texts are also in this area, so even
 more
  filtering would be needed.
  I very doubt that this situation will change at the TS and I also doubt
 that
  it will be different for WikiLabs. So I guess your best bet is the API
 here.
 
  Sincerely,
  DaB.

 I think the only hope would be if wikidata was stored under its own
 cluster (for easier differenciation) and at least one server of that
 group (the master?) only had that (so toolserver could get its binlogs).

 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] Draft online: Roadmap for the setup of Tool Labs and the migration from Toolserver

2013-04-19 Thread Silke Meyer
Hi all!

As you know Wikimedia Deutschland has the mandate to create a roadmap
for the availability of features on Wikimedia Labs / Tool Labs that are
currently available on the Toolserver until the next membership assembly
(May 25th). I have worked on this for the last weeks and now published a
draft. I would like to get your feedback on this draft to make sure it
is sound and nothing vital is missing. The current version is not final
but hopefully very close to what will be presented at the membership
assembly next month.

The plan has been drafted in close cooperation with the responsible
people at the Wikimedia Foundation. It still needs a final ok from them
though.

You can find the drafts at:

English version: http://www.mediawiki.org/wiki/Tool_Labs/Roadmap_en

German version: http://www.mediawiki.org/wiki/Tool_Labs/Roadmap_de

To not spread out discussions too far please use the discussion pages of
the above two pages for comments, suggestions and corrections.

Cheers, Silke

--
Silke Meyer
Internes IT-Management und Projektmanagement Toolserver

Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. (030) 219 158 260

http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt
für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] Maintenance: moving data of s1-user / rosemary

2013-04-19 Thread Marlen Caemmerer

Hello,

s1-user / rosemary will quite soon run out of space.
Thus I will move everything except enwiki itself onto a SAN partition.

This will happen next Friday

26st Apr 1900 - 2200 UTC

I suppose to have everything back online within one hour in this time frame.
Please not that s1-user will not be available during the move.

Kind regards,
Marlen/nosy


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] Maintenance: more disk space for s7

2013-04-19 Thread Marlen Caemmerer

Hello,

I will add more disk space for s7 next Wednesday

23rd April 1900-2200 UTC

I suppose to have completed in 1.5 hours.

Kind regards
Marlen/nosy


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Maintenance: more disk space for s7

2013-04-19 Thread Marlen Caemmerer

Hey, sorry wrong date for Wednesday - so its

 24th April 1900-2200 UTC

Kind regards
Marlen/nosy



___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

[Toolserver-l] Maintenance cassia s5-rr

2013-04-19 Thread Marlen Caemmerer

Hello,

I will move dewiki for cassia off the SAN back to the local storage.

This will happen on

Friday, 3rd May 1900-2100 UTC

I will take this copy of s5-rr offline but due to the other copy of s5-rr you 
will probably not experience downtime but slower answers.

Kind regards
Marlen/nosy


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Output files

2013-04-19 Thread Avocato
It doesn't work! my crons still make output files.

*By the way, cronsub is deprecated in favor of qcronsub*
I didn't understand this, how could I use qcronsub instead of my way?


2013/4/18 Platonides platoni...@gmail.com

 On 18/04/13 02:09, Avocato wrote:
  Hello all. I use this way
  
 https://wiki.toolserver.org/view/Submit.toolserver.org#resource_definition@script
 
  for running crons. I make a file named /something.sh /for example,
  encluding the following:
 
  /#!/bin/sh
  #$ -j y
  #$ -o /dev/null
  cd pywikipedia
  python anyscript.py/
 
  Then, I put a cron like:
  /00 21 * * * cronsub -s something % sh $HOME/something.sh/
 
  The problem is that I want my crons to stop producing output files at
  home folder inside my account, I don't want it to produce outputs at
  all. How can I do that?

 Remove the -s flag of cronsub. It is giving qsub the parameters -j y -o
 $HOME/${JOBNAME}.out and it seems to be overriding the script ones.

 So you would put just
  /00 21 * * * cronsub something % sh $HOME/something.sh/

 By the way, cronsub is deprecated in favor of qcronsub

 Cheers


 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list:
 https://wiki.toolserver.org/view/Mailing_list_etiquette




-- 
*User:Avocato--*
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette