[Toolserver-l] Two beginner questions

2010-12-09 Thread Alex Brollo
1. I'm testing my skill and I run my script under cron. The python script
begin with these rows (and it runs):


# -*- coding: utf-8 -*-
#!/usr/bin/python

import os,sys

if not sys.platform==win32:
sys.path.append('/home/alebot/pywikipedia')
os.chdir(/home/alebot/scripts)


Then I tried to move to batch job sheduling, but... my script gives an
error: now the server dislikes sys.path row. Why? I obviously have to study
more: but what/where have I sto study? :-(

2. The script bring into life a python bot, who reads RecentChanges at 10
minutes intervals by a cron routine. Is perhaps more efficient a #irc bot
listening it.wikisource #irc channel for recent changes in your opinion?
Where can I find a good python script to read #irc channels?

Thanks - I apologize for so banal questions.

Alex
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Sumurai8 (DD)
irc listening with python is fairly easy; just use a socket

import socket
IRC = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
IRC.connect(('irc.freenode.net', 6667))
while True:
    text = IRC.recv(1024)
    msgs = text.split('\n')
    for msg in msgs:
        if msg.split(' ', 1)[0] == PING:
            pong = msg.split(' ', 1)[1]
            IRC.send(PONG %s % pong)
        print msg

If you want to do periodically things, like writing the output to a
file very 10 minutes, you have to set a timeout. Otherwise the script
will wait at the recv-line till it receives data

2010/12/9 Alex Brollo alex.bro...@gmail.com

 1. I'm testing my skill and I run my script under cron. The python script 
 begin with these rows (and it runs):

 # -*- coding: utf-8 -*-
 #!/usr/bin/python
 import os,sys
 if not sys.platform==win32:
     sys.path.append('/home/alebot/pywikipedia')
     os.chdir(/home/alebot/scripts)

 Then I tried to move to batch job sheduling, but... my script gives an error: 
 now the server dislikes sys.path row. Why? I obviously have to study more: 
 but what/where have I sto study? :-(
 2. The script bring into life a python bot, who reads RecentChanges at 10 
 minutes intervals by a cron routine. Is perhaps more efficient a #irc bot 
 listening it.wikisource #irc channel for recent changes in your opinion? 
 Where can I find a good python script to read #irc channels?
 Thanks - I apologize for so banal questions.
 Alex



 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list: 
 https://wiki.toolserver.org/view/Mailing_list_etiquette

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Sumurai8 (DD)
Oops, forgot to put a return after the pongmsg, like this:
IRC.send(PONG %s\n % pong)

The IRC-server will try to process the line after it finds a \n in your msg

Op 9 december 2010 17:04:24 UTC+1 heeft Sumurai8
sumur...@wikiweet.nl het volgende geschreven:
 irc listening with python is fairly easy; just use a socket

 import socket
 IRC = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 IRC.connect(('irc.freenode.net', 6667))
 while True:
     text = IRC.recv(1024)
     msgs = text.split('\n')
     for msg in msgs:
         if msg.split(' ', 1)[0] == PING:
             pong = msg.split(' ', 1)[1]
             IRC.send(PONG %s % pong)
         print msg

 If you want to do periodically things, like writing the output to a file very 
 10 minutes, you have to set a timeout. Otherwise the script will wait at the 
 recv-line till it receives data

 2010/12/9 Alex Brollo alex.bro...@gmail.com

 1. I'm testing my skill and I run my script under cron. The python script 
 begin with these rows (and it runs):

 # -*- coding: utf-8 -*-
 #!/usr/bin/python
 import os,sys
 if not sys.platform==win32:
     sys.path.append('/home/alebot/pywikipedia')
     os.chdir(/home/alebot/scripts)

 Then I tried to move to batch job sheduling, but... my script gives an 
 error: now the server dislikes sys.path row. Why? I obviously have to study 
 more: but what/where have I sto study? :-(
 2. The script bring into life a python bot, who reads RecentChanges at 10 
 minutes intervals by a cron routine. Is perhaps more efficient a #irc bot 
 listening it.wikisource #irc channel for recent changes in your opinion? 
 Where can I find a good python script to read #irc channels?
 Thanks - I apologize for so banal questions.
 Alex



 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list: 
 https://wiki.toolserver.org/view/Mailing_list_etiquette



___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread River Tarnell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Sumurai8 (DD):
     text = IRC.recv(1024)
     msgs = text.split('\n')

This seems to have a bug: if there's more than 1024 bytes waiting, you could
receive only part of the final message; so you will truncate that message, and 
the next recv will receive the other half (which will then be effectively 
junk).

- river.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk0A/0QACgkQIXd7fCuc5vKX8QCeKN77J7YXVJaO5utUVMyxCC5a
ubsAnR/+E/8WtjZuD1Qrc78S5v68ZQ5/
=z4ru
-END PGP SIGNATURE-

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Alex Brollo
2010/12/9 Bryan Tong Minh bryan.tongm...@gmail.com

 On Thu, Dec 9, 2010 at 4:54 PM, Alex Brollo alex.bro...@gmail.com wrote:
  Then I tried to move to batch job sheduling, but... my script gives an
  error: now the server dislikes sys.path row. Why? I obviously have to
 study
  more: but what/where have I sto study? :-(
 
 Please give the specific error message. It is hard to believe that the
 error is the server dislikes sys.path.


:-)
It gives an error for that line, precisely mentioning sys.path. I didn't
save the message, but I can try to reproduce it.

Alex
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Platonides
Alex Brollo wrote:
 2. The script bring into life a python bot, who reads RecentChanges at
 10 minutes intervals by a cron routine. Is perhaps more efficient a #irc
 bot listening it.wikisource #irc channel for recent changes in your
 opinion?

Yes. Specially since you presumably want to get *all* RecentChanges
which makes the 10 minutes value arbitrary.

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Alex Brollo
2010/12/9 Platonides platoni...@gmail.com

 Alex Brollo wrote:
  2. The script bring into life a python bot, who reads RecentChanges at
  10 minutes intervals by a cron routine. Is perhaps more efficient a #irc
  bot listening it.wikisource #irc channel for recent changes in your
  opinion?

 Yes. Specially since you presumably want to get *all* RecentChanges
 which makes the 10 minutes value arbitrary.


Thanks to all from you.  My 10 minutes interval readings were only a trick
to skip over  my continuously listening unskillness. I'll study a little
bit the socket stuff and your code, then - I guess - I'll ask you again for
details/troubles. :-)

Consider that I'm VERY slow when learning new routines and presently
I've no idea about what precisely is a socket. :-)

Alex
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Sumurai8 (DD)
It's just a plain idea how you can make an irc bot. Possible solutions
are making the buffer bigger or preserving the last message if it
doesn't end with a \n. For WikiLinkBot the first solution works just
fine (If reading the recent changes every 10 minutes just works fine,
making a bigger buffer should do the job (max. 500 edits in 600
seconds, then just make the buffer a little bigger).

Sumurai8

2010/12/9 River Tarnell river.tarn...@wikimedia.de:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Sumurai8 (DD):
     text = IRC.recv(1024)
     msgs = text.split('\n')

 This seems to have a bug: if there's more than 1024 bytes waiting, you could
 receive only part of the final message; so you will truncate that message, and
 the next recv will receive the other half (which will then be effectively
 junk).

        - river.
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.11 (FreeBSD)

 iEYEARECAAYFAk0A/0QACgkQIXd7fCuc5vKX8QCeKN77J7YXVJaO5utUVMyxCC5a
 ubsAnR/+E/8WtjZuD1Qrc78S5v68ZQ5/
 =z4ru
 -END PGP SIGNATURE-

 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list: 
 https://wiki.toolserver.org/view/Mailing_list_etiquette

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Bryan Tong Minh
On Thu, Dec 9, 2010 at 5:36 PM, Platonides platoni...@gmail.com wrote:
 Sumurai8 (DD) wrote:
 Oops, forgot to put a return after the pongmsg, like this:
 IRC.send(PONG %s\n % pong)

 The IRC-server will try to process the line after it finds a \n in your msg

 According to the protocol, it should be a CRLF (\r\n). Although a bare
 \n seems to be commonly accepted as well.

In fact some ircds only look at the first 4 chars, PONG, regardless
whether there is a new line at all.


Bryan

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Sumurai8 (DD)
Well... you can actually send every 3 minutes a PONG-message without
listening to the IRC-channel and the server will gladly accept that
^_^ . That's what I did at the time I didn't know about the
timeout-option of a socket :) But most of the time it is just better
to follow the rules and end each line with \r\n (nice, didn't know
about that, so changed it in my script :) ), send a PONG-msg followed
by everything that was send after the PING-message, etc, etc.

2010/12/9 Bryan Tong Minh bryan.tongm...@gmail.com:
 On Thu, Dec 9, 2010 at 5:36 PM, Platonides platoni...@gmail.com wrote:
 Sumurai8 (DD) wrote:
 Oops, forgot to put a return after the pongmsg, like this:
 IRC.send(PONG %s\n % pong)

 The IRC-server will try to process the line after it finds a \n in your msg

 According to the protocol, it should be a CRLF (\r\n). Although a bare
 \n seems to be commonly accepted as well.

 In fact some ircds only look at the first 4 chars, PONG, regardless
 whether there is a new line at all.


 Bryan

 ___
 Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
 https://lists.wikimedia.org/mailman/listinfo/toolserver-l
 Posting guidelines for this list: 
 https://wiki.toolserver.org/view/Mailing_list_etiquette


___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Михајло Анђелковић
Long ago I have noticed that the irc server is kicking my bot out
after some time from some reason.

Then I looked closer and noticed there is a server's ping around that
mishaps. Alright, then I just added an ad-hoc pong:

public void responsePing(String line) {
try {
out.println(PONG : + 
line.substring(line.indexOf(:)+1));
} catch(Throwable th) {
// ...
}
}

And said it to go to hell. Pure storytelling is not why I am writing
this. I have a question. I was returning the server whatever it was
sending to me as a ping. This is how it worked like two years ago. Has
something changed?

M

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


[Toolserver-l] mysql quries killing script

2010-12-09 Thread Mashiah Davidson
Hello,

One day it was announced that long running queries are being killed in
case the replag exceeds some value.

I've added a simple piece of code to my tools, which prints replag
info in case a query is killed and a few days ago I've got the
following result:

--
ERROR 1317 (70100) at line 6874: Query execution was interrupted
last replicated timestamp: 20101207214400
replag: 00:00:01
--

Could anyone explain whether it was possible that a query (even a long
running one) has been killed when replag was so good?

mashiah

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Platonides
Михајло Анђелковић wrote:
 Long ago I have noticed that the irc server is kicking my bot out
 after some time from some reason.
 
 Then I looked closer and noticed there is a server's ping around that
 mishaps. Alright, then I just added an ad-hoc pong:
 
   public void responsePing(String line) {
   try {
   out.println(PONG : + 
 line.substring(line.indexOf(:)+1));
   } catch(Throwable th) {
   // ...
   }
   }
 
 And said it to go to hell. Pure storytelling is not why I am writing
 this. I have a question. I was returning the server whatever it was
 sending to me as a ping. This is how it worked like two years ago. Has
 something changed?
 
 M

No.

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Platonides
Sumurai8 (DD) wrote:
 Well... you can actually send every 3 minutes a PONG-message without
 listening to the IRC-channel and the server will gladly accept that
 ^_^ . That's what I did at the time I didn't know about the
 timeout-option of a socket :) But most of the time it is just better
 to follow the rules and end each line with \r\n (nice, didn't know
 about that, so changed it in my script :) ), send a PONG-msg followed
 by everything that was send after the PING-message, etc, etc.

Some ircds will, with every right to do so, not complete your login into
the network in that case.
Strangely, I don't see that kind of protection in freenode's ircd-seven
despite being alledgedly protected from the javascript spam that plagued
the last days of hyperion[1].

1- http://blog.freenode.net/2010/01/javascript-spam/

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette


Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Giftpflanze
MZMcBride schrieb:
 Alex Brollo wrote:
  2. The script bring(s) into life a python bot, who reads 
  RecentChanges at 10 minutes intervals by a cron routine. Is perhaps 
  more efficient a #irc bot listening it.wikisource #irc channel for 
  recent changes in your opinion? Where can I find a good python 
  script to read #irc channels?
 
 Gahhh, this list. Nobody suggested just using Python's Twisted?[1] So 
 much easier than trying to write your own script in Python using 
 sockets and manual pongs and all that jazz.

The process of IRC listening is not that dramatic, regardless of 
language. That could easily be made manually.

 You're more than welcome to look around my home directory (check 
 /home/mzmcbride/scripts/irc/) for some IRC bots. The bot I 
 specifically use to relay irc.wikimedia.org to irc.freenode.net is on 
 another server, but I'd be happy to post the code for you if you'd 
 like. His name is snitch and he supports all Wikimedia wikis, multiple 
 channels, and stalks per-page, per-user, or per-wiki.

Interesting.

Here’s my RE that parses the RC IRC message in all aspects I know of:

The first line splits the server line into the actual IRC message and 
the channel (i.e. wiki) it is coming from. The sending nick is ignored 
since noone is allowed to talk at all and because it may change.

The second splits the message into its 6 constituent parts. That works 
for every single line at the moment (sometimes a detail changes and we 
are left with a mess), be it even a log entry and not an ordinary edit, 
because the surrounding markup is present at every line. Sometimes the 
message is too long for the IRC format (which allows for 512 bytes 
including the final \r\n), so beware of cut off lines.

The REs are in the re_syntax(n) Tcl-style format (since this is taken 
from my MediaWiki Tcl Library [~gifti/bot/irc.tcl]) but can easily be 
adopted to other languages I assume. I use \003 and \002 instead of 
direct ASCII for better readability and transportability. Consider that 
the color codes are sometimes with leading zeros, sometimes not.

regexp {:[^ ]+ PRIVMSG #([^ ]+) :(.*?)} $line - channel message

regexp {\00314\[\[\00307(.*)\00314\]\]\0034 (.*)\00310 \00302(.*)\003 
\0035\*\003 \00303(.*)\003 \0035\*\003 \(*\002*\+*([^)]*)\002*\)* 
\00310(.*?)\003*} $message - title action url user bytes comment

Giftpflanze

___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Re: [Toolserver-l] Two beginner questions

2010-12-09 Thread Alex Brollo
2010/12/10 Giftpflanze m.p.ropp...@web.de


  Gahhh, this list. Nobody suggested just using Python's Twisted?[1] So
  much easier than trying to write your own script in Python using
  sockets and manual pongs and all that jazz.

I'm going to drag as deep as I can into http://krondo.com/?p=1209. Thanks
for suggestion. This will help me into the second step: and now that I have
my clean parsed #irc message... how can I use it for my tasks, sometimes
simple, sometimes far from simple, while listening for other messages? I'd
try a DIY (do it yourself)  way... but I guess that it's not so an exotic
problem, nad that's much better to study a little bit.



 Here’s my RE that parses the RC IRC message in all aspects I know of:

 The first line splits the server line into the actual IRC message and
 the channel (i.e. wiki) it is coming from. The sending nick is ignored
 since noone is allowed to talk at all and because it may change.

 The second splits the message into its 6 constituent parts. That works
 for every single line at the moment (sometimes a detail changes and we
 are left with a mess), be it even a log entry and not an ordinary edit,
 because the surrounding markup is present at every line. Sometimes the
 message is too long for the IRC format (which allows for 512 bytes
 including the final \r\n), so beware of cut off lines.

 The REs are in the re_syntax(n) Tcl-style format (since this is taken
 from my MediaWiki Tcl Library [~gifti/bot/irc.tcl]) but can easily be
 adopted to other languages I assume. I use \003 and \002 instead of
 direct ASCII for better readability and transportability. Consider that
 the color codes are sometimes with leading zeros, sometimes not.

 regexp {:[^ ]+ PRIVMSG #([^ ]+) :(.*?)} $line - channel message

 regexp {\00314\[\[\00307(.*)\00314\]\]\0034 (.*)\00310 \00302(.*)\003
 \0035\*\003 \00303(.*)\003 \0035\*\003 \(*\002*\+*([^)]*)\002*\)*
 \00310(.*?)\003*} $message - title action url user bytes comment


VERY interesting, thank you!

Alex
___
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette