Re: [fpc-pascal] Fast HTML Parser
On Fri, Aug 8, 2014 at 9:40 AM, Marco van de Voort wrote: > There is xpath support in fcl-xml? Yes. But HTML files used to be very irregular XML. Some files can raise an error when trying to open. Things like "" without closing element were easy to find. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
On Thu, Aug 7, 2014 at 8:53 PM, luiz americo pereira camara wrote: > > You can try http://www.benibela.de/sources_en.html#internettools I will see, thanks. Regards, Marcos Douglas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
In our previous episode, luiz americo pereira camara said: > > > > It's not a simple parser. It has the ability to extract part of html > through templates. See http://videlibri.sourceforge.net/cgi-bin/xidelcgi There is xpath support in fcl-xml? ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
2014-08-08 8:28 GMT-03:00 Marco van de Voort : > In our previous episode, luiz americo pereira camara said: > > You can try http://www.benibela.de/sources_en.html#internettools > > That seems more something like sax_html fromt the fcl-xml package. > It's not a simple parser. It has the ability to extract part of html through templates. See http://videlibri.sourceforge.net/cgi-bin/xidelcgi Luiz ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
In our previous episode, luiz americo pereira camara said: > You can try http://www.benibela.de/sources_en.html#internettools That seems more something like sax_html fromt the fcl-xml package. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
You can try http://www.benibela.de/sources_en.html#internettools Luiz 2014-08-07 10:20 GMT-03:00 Marco van de Voort : > In our previous episode, Marcos Douglas said: > > > It has (or at least had) a very simple to use HTML parser that was very > > > fast. If you don't come write with the above URL, I have some release > > > archives I know contains the code. Just let me know and I can make it > > > available. > > > > But the fasthtmlparser, your tip before, is a powtils' source, don't? > > I have the code -- for many years -- but I did not know about > > fasthtmlparser. It's very simple. I did not found everything I want > > but it is a good start. > > Yes it is. The CHM parser is also based on it, but there z505 is not listed > as author but as contributor: > > AUTHOR : James Azarja > http://www.jazarsoft.com/ > > CONTRIBUTORS : L505 > http://z505.com > > > ___ > fpc-pascal maillist - fpc-pascal@lists.freepascal.org > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal > ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
In our previous episode, Marcos Douglas said: > > It has (or at least had) a very simple to use HTML parser that was very > > fast. If you don't come write with the above URL, I have some release > > archives I know contains the code. Just let me know and I can make it > > available. > > But the fasthtmlparser, your tip before, is a powtils' source, don't? > I have the code -- for many years -- but I did not know about > fasthtmlparser. It's very simple. I did not found everything I want > but it is a good start. Yes it is. The CHM parser is also based on it, but there z505 is not listed as author but as contributor: AUTHOR : James Azarja http://www.jazarsoft.com/ CONTRIBUTORS : L505 http://z505.com ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
On Wed, Aug 6, 2014 at 6:51 PM, Graeme Geldenhuys wrote: > On 2014-08-06 21:54, Marcos Douglas wrote: >> I know the tokens to search, but the HTML could be very different each other. >> I can't use a external tool. Need to be a application (that already exists). > > Take a look at POWtils (aka PWU or PSP or Pascal Server Pages) created > by somebody known as Z505. There has been various locations for the > source code, but I think the latest is at: > > https://code.google.com/p/powtils/ > > It has (or at least had) a very simple to use HTML parser that was very > fast. If you don't come write with the above URL, I have some release > archives I know contains the code. Just let me know and I can make it > available. But the fasthtmlparser, your tip before, is a powtils' source, don't? I have the code -- for many years -- but I did not know about fasthtmlparser. It's very simple. I did not found everything I want but it is a good start. Best regards, Marcos Douglas PS: Like you I use FPC in real applications in production. So I have a deadline - always short - to fulfill. So finding good code to help in our projects is very good because it makes us save time. Thanks. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
On 08/06/2014 07:54 PM, Rainer Stratmann wrote: It's not that difficult to write yourself. In fact, my son once did write (using Delphi) a parser that creates a list of hierarchically linked objects from HTML code and also can write a HTML file from this structure. So you can read a file, use straight forward programming to modify the content, and write it back. As the HTML format is not very strict and is a moving target, the parser unit is far from perfect, but it is in daily use and does a rather nice job. OTOH, I would not say it's fast, anyway :-( . -Michael ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
On 2014-08-06 21:54, Marcos Douglas wrote: > I know the tokens to search, but the HTML could be very different each other. > I can't use a external tool. Need to be a application (that already exists). It seems a copy of the Fast HTML Parser unit I spoke of has made its way into the FPC source code tree. See /packages/chm/src/fasthtmlparser.pas Attached is the original one I got from powtils release. It includes the parser, a utility unit and a demo program showing the parser in action with some stats output. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ fasthtmlparser.tar.gz Description: GNU Zip compressed data ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
On 2014-08-06 21:54, Marcos Douglas wrote: > I know the tokens to search, but the HTML could be very different each other. > I can't use a external tool. Need to be a application (that already exists). Take a look at POWtils (aka PWU or PSP or Pascal Server Pages) created by somebody known as Z505. There has been various locations for the source code, but I think the latest is at: https://code.google.com/p/powtils/ It has (or at least had) a very simple to use HTML parser that was very fast. If you don't come write with the above URL, I have some release archives I know contains the code. Just let me know and I can make it available. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
On Wed, Aug 6, 2014 at 9:58 PM, Andrew Haines wrote: > On 08/06/14 13:50, Marcos Douglas wrote: >> Hi, >> >> Someone knows a fast html parser to use in Pascal code? >> >> I need something like this: >> >> HTML: >> >> 1 >> 2 >> >> >> I need a function/object to give me only the values: >> 1 >> 2 >> >> Something like: >> S := GetHTMLValues('sel_x'); >> >> R > > There is the unit fasthtmlparser included with fpc in the packages/chm > folder. > > It is pretty basic and just has callbacks for tags and text. I don't > think it's smart enough to tell you of the > > name="sel_x" part of your tag. You're right, but I change my code to use fasthtmlparser and worked (at least for now). Thank you. > Maybe it can be improved. I agree. If I change something, I'll send a patch. Marcos Douglas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
On 08/06/14 13:50, Marcos Douglas wrote: > Hi, > > Someone knows a fast html parser to use in Pascal code? > > I need something like this: > > HTML: > > 1 > 2 > > > I need a function/object to give me only the values: > 1 > 2 > > Something like: > S := GetHTMLValues('sel_x'); > > R There is the unit fasthtmlparser included with fpc in the packages/chm folder. It is pretty basic and just has callbacks for tags and text. I don't think it's smart enough to tell you of the name="sel_x" part of your tag. Maybe it can be improved. Regards, Andrew Haines ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
On Wed, Aug 6, 2014 at 5:46 PM, Mark Morgan Lloyd wrote: > Marcos Douglas wrote: >> >> On Wed, Aug 6, 2014 at 2:54 PM, Rainer Stratmann >> wrote: >>> >>> On Wednesday 06 August 2014 19:50:44 you wrote: Hi, Someone knows a fast html parser to use in Pascal code? I need something like this: HTML: 1 2 I need a function/object to give me only the values: 1 2 Something like: S := GetHTMLValues('sel_x'); >>> >>> It's not that difficult to write yourself. >> >> >> You're right. But I'm searching the faster HTML parser to use in huge >> HTML files... thousands of files. > > > I disagree: it's damn difficult if one isn't working with tightly > constrained input, and the original question says HTML without specifying > it's a subset. > > There's a couple of places where I parse HTML files that I've created > myself, i.e. I know exactly what's in them, using- basically- a simple > recursive-descent parser with some rather flexible ideas about comments > (i.e. in the above example, name="sel_x" could be lost as a comment). > However if I'm doing a brute-force job over a large number of files I > usually use Lynx as a preprocessor, which allows me to use standard > text-processing utilities to pull named rows out of tabulated reports. I know the tokens to search, but the HTML could be very different each other. I can't use a external tool. Need to be a application (that already exists). Thanks, Marcos Douglas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
Marcos Douglas wrote: On Wed, Aug 6, 2014 at 2:54 PM, Rainer Stratmann wrote: On Wednesday 06 August 2014 19:50:44 you wrote: Hi, Someone knows a fast html parser to use in Pascal code? I need something like this: HTML: 1 2 I need a function/object to give me only the values: 1 2 Something like: S := GetHTMLValues('sel_x'); It's not that difficult to write yourself. You're right. But I'm searching the faster HTML parser to use in huge HTML files... thousands of files. I disagree: it's damn difficult if one isn't working with tightly constrained input, and the original question says HTML without specifying it's a subset. There's a couple of places where I parse HTML files that I've created myself, i.e. I know exactly what's in them, using- basically- a simple recursive-descent parser with some rather flexible ideas about comments (i.e. in the above example, name="sel_x" could be lost as a comment). However if I'm doing a brute-force job over a large number of files I usually use Lynx as a preprocessor, which allows me to use standard text-processing utilities to pull named rows out of tabulated reports. -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues] ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
On Wed, Aug 6, 2014 at 2:54 PM, Rainer Stratmann wrote: > On Wednesday 06 August 2014 19:50:44 you wrote: >> Hi, >> >> Someone knows a fast html parser to use in Pascal code? >> >> I need something like this: >> >> HTML: >> >> 1 >> 2 >> >> >> I need a function/object to give me only the values: >> 1 >> 2 >> >> Something like: >> S := GetHTMLValues('sel_x'); > > It's not that difficult to write yourself. You're right. But I'm searching the faster HTML parser to use in huge HTML files... thousands of files. Best regards, Marcos Douglas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Fast HTML Parser
It's not that difficult to write yourself. On Wednesday 06 August 2014 19:50:44 you wrote: > Hi, > > Someone knows a fast html parser to use in Pascal code? > > I need something like this: > > HTML: > > 1 > 2 > > > I need a function/object to give me only the values: > 1 > 2 > > Something like: > S := GetHTMLValues('sel_x'); > > Regards, > Marcos Douglas > ___ > fpc-pascal maillist - fpc-pascal@lists.freepascal.org > http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
[fpc-pascal] Fast HTML Parser
Hi, Someone knows a fast html parser to use in Pascal code? I need something like this: HTML: 1 2 I need a function/object to give me only the values: 1 2 Something like: S := GetHTMLValues('sel_x'); Regards, Marcos Douglas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal