Marvin Humphrey wrote on 11/7/10 10:06 AM: > One thing I'm realizing is that I really don't want to contribute or maintain > C sample code which operates in a web context. C is too prone to security > vulnerabilities, its string handling sucks so you need waaaaay more code, and > things like URI escaping and HTML tag stripping aren't offered by the standard > library and aren't easy to fake up. It's the wrong language for a quickie CGI > app.
Agreed. > > I think it makes more sense for the C tutorial to operate in a command-line > context, even if the tutorials for other host language bindings target the > web. But then we have a problem: the current HTML format of our sample corpus > isn't suitable. The solution, I think, is to change all those docs to plain > text, with the title on the first line: > > Amendment XIII > > 1. Neither slavery nor involuntary servitude, except as a punishment for > crime whereof the party shall have been duly convicted, shall exist within > the United States, or any place subject to their jurisdiction. > > 2. Congress shall have power to enforce this article by appropriate > legislation. > > Plain text will work for either web or command-line context, and as a bonus, > for web-context tutorials we no longer have to either pull in an HTML parsing > dependency or do something hackish with regexes. > Agreed. For what it's worth, my intention, once we have a working C API, is to include as part of libswish3 a "swish_lucy.c" example of using Lucy with libswish3, which *does* do all the HTML/XML parsing. See, for example: http://dev.swish-e.org/browser/libswish3/trunk/src/swish_lint.c http://dev.swish-e.org/browser/libswish3/trunk/src/xapian/swish_xapian.cpp -- Peter Karman . http://peknet.com/ . [email protected]
