Re: Getting the web page language

2002-09-08 Thread Octavian Rasnita

Hi,

I am wondering how Google and Altavista search engines can find out the
language used in a web page.
I can see that they can find the pages written in Romanian language, for
example, even though the header of the file is same as for english ones.
Do they search for some words?
If you think this could be the only solution, is there any Perl module
thatcan do that?

Thanks.

Teddy's Center: http://teddy.fcc.ro/
Mail: [EMAIL PROTECTED]

- Original Message -
From: "Kevin Meltzer" <[EMAIL PROTECTED]>
To: "Octavian Rasnita" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Sunday, September 08, 2002 8:54 PM
Subject: Re: Getting the web page language


Does it matter what language, or what charset?



You can always look at this and the lang="foo" tags to try to determine
what language, or at least what charset, the page is in. Of course, a
charset (like iso-8869-1) can cover many languages, but at least you
can narrow it down a little if you don't find a lang="foo" tag.

Cheers,
Kevin

On Sun, Sep 08, 2002 at 08:05:18AM +0300, Octavian Rasnita
([EMAIL PROTECTED]) said something similar to:
> Hi all,
>
> I want to create a search engine. Please tell me how can I find out the
> languages used in a web page.
> I know that HTML 4.01 uses  for example, but most of the
web
> pages don't use this tag.
>
> What should I test to find the language used?
>
> Thank you.
>
> Teddy's Center: http://teddy.fcc.ro/
> Mail: [EMAIL PROTECTED]
>
>
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

--
[Writing CGI Applications with Perl - http://perlcgi-book.com]
You are all the Buddha.
-- Buddha (last words)



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




output pushing

2002-09-08 Thread Hytham Shehab

hi guys,
the normal way to write output to the client browser is done when the
whole program is finished.
is there a way to push output to the client browser in non-buffered way -
say, as line by line in the IRC clients - without need to print the the
whole page again, something like appending ?

--
Hytham Shehab


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Getting the web page language

2002-09-08 Thread Kevin Meltzer

Does it matter what language, or what charset? 



You can always look at this and the lang="foo" tags to try to determine
what language, or at least what charset, the page is in. Of course, a
charset (like iso-8869-1) can cover many languages, but at least you
can narrow it down a little if you don't find a lang="foo" tag.

Cheers,
Kevin

On Sun, Sep 08, 2002 at 08:05:18AM +0300, Octavian Rasnita ([EMAIL PROTECTED]) said 
something similar to:
> Hi all,
> 
> I want to create a search engine. Please tell me how can I find out the
> languages used in a web page.
> I know that HTML 4.01 uses  for example, but most of the web
> pages don't use this tag.
> 
> What should I test to find the language used?
> 
> Thank you.
> 
> Teddy's Center: http://teddy.fcc.ro/
> Mail: [EMAIL PROTECTED]
> 
> 
> 
> -- 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

-- 
[Writing CGI Applications with Perl - http://perlcgi-book.com]
You are all the Buddha.
-- Buddha (last words)

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Getting the web page language

2002-09-08 Thread Octavian Rasnita

Hi all,

I want to create a search engine. Please tell me how can I find out the
languages used in a web page.
I know that HTML 4.01 uses  for example, but most of the web
pages don't use this tag.

What should I test to find the language used?

Thank you.

Teddy's Center: http://teddy.fcc.ro/
Mail: [EMAIL PROTECTED]



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Sendback programs output as Perl/CGI

2002-09-08 Thread Bruce Ambraal



Hi all,
 
Could anyone assist with the following:
Let me explain when running  exe below on your c: drive 
the program, pulls the two input text file and produce two 
outputfile.
In addition the program the program display a screen 
output.
 
My request:
1) I want for this screen output to be send back to the user 
by CGI / HTML page.
2) On the same page I want an email prompt block enableing the 
user to enter his/her email address.
The idea is for the output file (train_out.txt and 
nnet_out.txt) to be emailed to the user.
 I currently have this perlscript[start_nn1.pl] that does 
not really do what I want
Will you be able to do this for me?
 
Thanks in advance.
Bruce
 


pogram_info.zip
Description: Zip compressed data

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Regular expression

2002-09-08 Thread Janek Schleicher

Octavian Rasnita wrote at Sat, 07 Sep 2002 11:03:39 +0200:

> I am trying to match a word boundry or an end of string.
> I would like something like:
> 
> /$word[\bX]/
> 
> where X is the symbol used for end of string. I know that I can use $ but I
> don't think I can use it between brackets.
> 
> I've seen that \b doesn't match the end or beginning of a string.
> I would like to know if there is another symbol that can match both these.

>From perldoc perlre:
  \Z  Match only at end of string, or before newline at the end
  \z  Match only at end of string

So you can write

 /$word(\b|\Z)/

Please note that you can't use \b in a character class.
>From perldoc perlre:
  (Within character classes "\b"
   represents backspace rather than a word boundary, just as
   it normally does in any double-quoted string.)


Greetings,
Janek


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting a string

2002-09-08 Thread Janek Schleicher

Janek Schleicher wrote at Sat, 07 Sep 2002 10:06:34 +0200:

>>> ...
>>> print join "\n", grep defined, ($string =~ /"(.*?)"|(\w+)/g);
>>> ...
> 
> The regexp matches either "(.*?)" in $1 or (\w+) in $1.
  ^^
Of course, I meant $2.

> But the regexp always makes a list of both ($1,$2) foreach match.
> That's why half of the list consists of undefined elements,
> that have to be grepped out.
> 
> It's not really nice, but it works.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]