Re: [fpc-pascal] Fast HTML Parser

2014-08-08 Thread Daniel Gaspary
On Fri, Aug 8, 2014 at 9:40 AM, Marco van de Voort  wrote:
> There is xpath support in fcl-xml?

Yes. But HTML files used to be very irregular XML.  Some files can
raise an error when trying to open.

Things like "" without closing element were easy to find.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-08 Thread Marcos Douglas
On Thu, Aug 7, 2014 at 8:53 PM, luiz americo pereira camara
 wrote:
>
> You can try http://www.benibela.de/sources_en.html#internettools

I will see, thanks.

Regards,
Marcos Douglas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-08 Thread Marco van de Voort
In our previous episode, luiz americo pereira camara said:
> >
> 
> It's not a simple parser. It has the ability to extract part of html
> through templates. See http://videlibri.sourceforge.net/cgi-bin/xidelcgi

There is xpath support in fcl-xml?
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-08 Thread luiz americo pereira camara
2014-08-08 8:28 GMT-03:00 Marco van de Voort :

> In our previous episode, luiz americo pereira camara said:
> > You can try http://www.benibela.de/sources_en.html#internettools
>
> That seems more something like sax_html fromt the fcl-xml package.
>

It's not a simple parser. It has the ability to extract part of html
through templates. See http://videlibri.sourceforge.net/cgi-bin/xidelcgi

Luiz
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Fast HTML Parser

2014-08-08 Thread Marco van de Voort
In our previous episode, luiz americo pereira camara said:
> You can try http://www.benibela.de/sources_en.html#internettools

That seems more something like sax_html fromt the fcl-xml package.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-07 Thread luiz americo pereira camara
You can try http://www.benibela.de/sources_en.html#internettools

Luiz


2014-08-07 10:20 GMT-03:00 Marco van de Voort :

> In our previous episode, Marcos Douglas said:
> > > It has (or at least had) a very simple to use HTML parser that was very
> > > fast. If you don't come write with the above URL, I have some release
> > > archives I know contains the code. Just let me know and I can make it
> > > available.
> >
> > But the fasthtmlparser, your tip before, is a powtils' source, don't?
> > I have the code -- for many years -- but I did not know about
> > fasthtmlparser. It's very simple. I did not found everything I want
> > but it is a good start.
>
> Yes it is. The CHM parser is also based on it, but there z505 is not listed
> as author but as contributor:
>
>  AUTHOR   : James Azarja
> http://www.jazarsoft.com/
>
>  CONTRIBUTORS : L505
> http://z505.com
>
>
> ___
> fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
>
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Fast HTML Parser

2014-08-07 Thread Marco van de Voort
In our previous episode, Marcos Douglas said:
> > It has (or at least had) a very simple to use HTML parser that was very
> > fast. If you don't come write with the above URL, I have some release
> > archives I know contains the code. Just let me know and I can make it
> > available.
> 
> But the fasthtmlparser, your tip before, is a powtils' source, don't?
> I have the code -- for many years -- but I did not know about
> fasthtmlparser. It's very simple. I did not found everything I want
> but it is a good start.

Yes it is. The CHM parser is also based on it, but there z505 is not listed
as author but as contributor:

 AUTHOR   : James Azarja
http://www.jazarsoft.com/

 CONTRIBUTORS : L505
http://z505.com


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-07 Thread Marcos Douglas
On Wed, Aug 6, 2014 at 6:51 PM, Graeme Geldenhuys
 wrote:
> On 2014-08-06 21:54, Marcos Douglas wrote:
>> I know the tokens to search, but the HTML could be very different each other.
>> I can't use a external tool. Need to be a application (that already exists).
>
> Take a look at POWtils (aka PWU or PSP or Pascal Server Pages) created
> by somebody known as Z505. There has been various locations for the
> source code, but I think the latest is at:
>
>   https://code.google.com/p/powtils/
>
> It has (or at least had) a very simple to use HTML parser that was very
> fast. If you don't come write with the above URL, I have some release
> archives I know contains the code. Just let me know and I can make it
> available.

But the fasthtmlparser, your tip before, is a powtils' source, don't?
I have the code -- for many years -- but I did not know about
fasthtmlparser. It's very simple. I did not found everything I want
but it is a good start.

Best regards,
Marcos Douglas

PS: Like you I use FPC in real applications in production. So I have a
deadline - always short - to fulfill. So finding good code to help in
our projects is very good because it makes us save time. Thanks.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-07 Thread Michael Schnell

On 08/06/2014 07:54 PM, Rainer Stratmann wrote:

It's not that difficult to write yourself.

In fact, my son once did write (using Delphi) a parser that creates a 
list of hierarchically linked objects from HTML code and also can write 
a HTML file from this structure.


So you can read a file, use straight forward programming to modify the 
content, and write it back.


As the HTML format is not very strict and is a moving target, the parser 
unit is far from perfect, but it is in daily use and does a rather nice 
job.


OTOH, I would not say it's fast, anyway :-( .

-Michael
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-07 Thread Graeme Geldenhuys
On 2014-08-06 21:54, Marcos Douglas wrote:
> I know the tokens to search, but the HTML could be very different each other.
> I can't use a external tool. Need to be a application (that already exists).

It seems a copy of the Fast HTML Parser unit I spoke of has made its way
into the FPC source code tree.

See /packages/chm/src/fasthtmlparser.pas

Attached is the original one I got from powtils release. It includes the
parser, a utility unit and a demo program showing the parser in action
with some stats output.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/


fasthtmlparser.tar.gz
Description: GNU Zip compressed data
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Fast HTML Parser

2014-08-07 Thread Graeme Geldenhuys
On 2014-08-06 21:54, Marcos Douglas wrote:
> I know the tokens to search, but the HTML could be very different each other.
> I can't use a external tool. Need to be a application (that already exists).

Take a look at POWtils (aka PWU or PSP or Pascal Server Pages) created
by somebody known as Z505. There has been various locations for the
source code, but I think the latest is at:

  https://code.google.com/p/powtils/

It has (or at least had) a very simple to use HTML parser that was very
fast. If you don't come write with the above URL, I have some release
archives I know contains the code. Just let me know and I can make it
available.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-06 Thread Marcos Douglas
On Wed, Aug 6, 2014 at 9:58 PM, Andrew Haines  wrote:
> On 08/06/14 13:50, Marcos Douglas wrote:
>> Hi,
>>
>> Someone knows a fast html parser to use in Pascal code?
>>
>> I need something like this:
>>
>> HTML:
>> 
>> 1
>> 2
>> 
>>
>> I need a function/object to give me only the values:
>> 1
>> 2
>>
>> Something like:
>> S := GetHTMLValues('sel_x');
>>
>> R
>
> There is the unit fasthtmlparser included with fpc in the packages/chm
> folder.
>
> It is pretty basic and just has callbacks for tags and text. I don't
> think it's smart enough to tell you of the
>
> name="sel_x" part of your tag.

You're right, but I change my code to use fasthtmlparser and worked
(at least for now). Thank you.

> Maybe it can be improved.
I agree. If I change something, I'll send a patch.


Marcos Douglas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-06 Thread Andrew Haines
On 08/06/14 13:50, Marcos Douglas wrote:
> Hi,
>
> Someone knows a fast html parser to use in Pascal code?
>
> I need something like this:
>
> HTML:
> 
> 1
> 2
> 
>
> I need a function/object to give me only the values:
> 1
> 2
>
> Something like:
> S := GetHTMLValues('sel_x');
>
> R

There is the unit fasthtmlparser included with fpc in the packages/chm
folder.

It is pretty basic and just has callbacks for tags and text. I don't
think it's smart enough to tell you of the

name="sel_x" part of your tag. Maybe it can be improved.

Regards,

Andrew Haines

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-06 Thread Marcos Douglas
On Wed, Aug 6, 2014 at 5:46 PM, Mark Morgan Lloyd
 wrote:
> Marcos Douglas wrote:
>>
>> On Wed, Aug 6, 2014 at 2:54 PM, Rainer Stratmann
>>  wrote:
>>>
>>>  On Wednesday 06 August 2014 19:50:44 you wrote:

 Hi,

 Someone knows a fast html parser to use in Pascal code?

 I need something like this:

 HTML:
 
 1
 2
 

 I need a function/object to give me only the values:
 1
 2

 Something like:
 S := GetHTMLValues('sel_x');
>>>
>>> It's not that difficult to write yourself.
>>
>>
>> You're right. But I'm searching the faster HTML parser to use in huge
>> HTML files... thousands of files.
>
>
> I disagree: it's damn difficult if one isn't working with tightly
> constrained input, and the original question says HTML without specifying
> it's a subset.
>
> There's a couple of places where I parse HTML files that I've created
> myself, i.e. I know exactly what's in them, using- basically- a simple
> recursive-descent parser with some rather flexible ideas about comments
> (i.e. in the above example, name="sel_x" could be lost as a comment).
> However if I'm doing a brute-force job over a large number of files I
> usually use Lynx as a preprocessor, which allows me to use standard
> text-processing utilities to pull named rows out of tabulated reports.

I know the tokens to search, but the HTML could be very different each other.
I can't use a external tool. Need to be a application (that already exists).

Thanks,
Marcos Douglas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-06 Thread Mark Morgan Lloyd

Marcos Douglas wrote:

On Wed, Aug 6, 2014 at 2:54 PM, Rainer Stratmann
 wrote:

 On Wednesday 06 August 2014 19:50:44 you wrote:

Hi,

Someone knows a fast html parser to use in Pascal code?

I need something like this:

HTML:

1
2


I need a function/object to give me only the values:
1
2

Something like:
S := GetHTMLValues('sel_x');

It's not that difficult to write yourself.


You're right. But I'm searching the faster HTML parser to use in huge
HTML files... thousands of files.


I disagree: it's damn difficult if one isn't working with tightly 
constrained input, and the original question says HTML without 
specifying it's a subset.


There's a couple of places where I parse HTML files that I've created 
myself, i.e. I know exactly what's in them, using- basically- a simple 
recursive-descent parser with some rather flexible ideas about comments 
(i.e. in the above example, name="sel_x" could be lost as a comment). 
However if I'm doing a brute-force job over a large number of files I 
usually use Lynx as a preprocessor, which allows me to use standard 
text-processing utilities to pull named rows out of tabulated reports.


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-06 Thread Marcos Douglas
On Wed, Aug 6, 2014 at 2:54 PM, Rainer Stratmann
 wrote:
>  On Wednesday 06 August 2014 19:50:44 you wrote:
>> Hi,
>>
>> Someone knows a fast html parser to use in Pascal code?
>>
>> I need something like this:
>>
>> HTML:
>> 
>> 1
>> 2
>> 
>>
>> I need a function/object to give me only the values:
>> 1
>> 2
>>
>> Something like:
>> S := GetHTMLValues('sel_x');
>
> It's not that difficult to write yourself.

You're right. But I'm searching the faster HTML parser to use in huge
HTML files... thousands of files.

Best regards,
Marcos Douglas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Fast HTML Parser

2014-08-06 Thread Rainer Stratmann
It's not that difficult to write yourself.


 On Wednesday 06 August 2014 19:50:44 you wrote:
> Hi,
> 
> Someone knows a fast html parser to use in Pascal code?
> 
> I need something like this:
> 
> HTML:
> 
> 1
> 2
> 
> 
> I need a function/object to give me only the values:
> 1
> 2
> 
> Something like:
> S := GetHTMLValues('sel_x');
> 
> Regards,
> Marcos Douglas
> ___
> fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
> http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal 
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] Fast HTML Parser

2014-08-06 Thread Marcos Douglas
Hi,

Someone knows a fast html parser to use in Pascal code?

I need something like this:

HTML:

1
2


I need a function/object to give me only the values:
1
2

Something like:
S := GetHTMLValues('sel_x');

Regards,
Marcos Douglas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal