[ https://issues.apache.org/jira/browse/CONNECTORS-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167439#comment-13167439 ]
Karl Wright commented on CONNECTORS-275: ---------------------------------------- Would you like to create a new ticket to cover changes to the Web Connector itself, as we've discussed above? I think this ticket basically covers documentation only. Once you've decided what you need then I think we'd want a new ticket that is specific for those code changes. > Clarify documentation as to how to set up session login for web connector > ------------------------------------------------------------------------- > > Key: CONNECTORS-275 > URL: https://issues.apache.org/jira/browse/CONNECTORS-275 > Project: ManifoldCF > Issue Type: Improvement > Components: Documentation, Web connector > Affects Versions: ManifoldCF 0.4 > Reporter: Karl Wright > Attachments: CONNECTORS-275.patch > > > A book reader has this comment, which basically implies that we need to > improve the documentation for the web connector: > "I was excited to get the full version of the online book, but then > disappointed when it referred back to the online doc for setting up logins > for a Web spidering. The online doc is very vague and only gives one example. > I've used Ultraseek's and Google's spider, but I still find the Session login > sequences non-obvious. > I've got a subscription request into the user mailing list, but here's the > parts that are not clear. > I generally understand about using regexes to define sites and sorting out > content pages from login pages. > But it's not clear why there's TWO Regex's per entry. There's a "Login URL" > regex, and also a "Form name/link target" regex. > It's also not clear about the "page type" radio button choices. > For "rediection", am I saying "look for a redirect event", or am I saying > "then DO a redirect to this page". > And for "form name", what if my login page doesn't have a named form? In the > case of the site I'm trying to spider, when your session expires, you > manually go back to an https page and supply your username and password as > CGI parameters. I know this sounds odd, but it's apparently how a number of > the sites we're trying to spider work, some proprietary software. > Karl, I really think the book or Wiki or doc needs 3 or 4 different examples > of login scenarios. > Here's the scenario I'm trying, if you'd like to use it: > Try to fetch: http://site.com/product?id=1234 > If you get a redirect to: http://site.com/Main.asp > Note that there's no login form nor link on this page. > Then invoke this login URL: > https://site.com/validate?username=me&password=that&otherArg=something > Note that you can't just visit this page and fill in a form, that gives an > error, it has to be passed in (I think as a GET) > Then record the session cookie and try for /product?id=1234 again. > I realize this is odd, I didn't design it. " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira