I think you've added a lot of useful details to the project and the new ideas look good to me. I can think of some projects in the past that I could have used them, and that's always a good sign :)
On 21 February 2014 15:23, Ruben Vereecken <[email protected]> wrote: > Ever since last we exchanged ideas, I thought about it quite a lot but > sadly found it hard to find some extra real-world uses for the project. > In here I'll try to collect the results of my monologuous brainstorm > sessions and invite anyone and everyone for open discussion on the subject. > > I post this to the scrapy-users list because any user, developer or no, is > invited to read this and share their thoughts on my ramblings. > > The basic idea is to control spiders through an HTTP API (I'll use REST > API from now on, correct me on this if you like). Slightly similar to the > currently present WebService. > > - As Shane said before, it would be even more helpful if the same API > would allow access to all spiders encapsulated by one and the same project. > So, recapped, spiders are mapped by their name on a parameter or on domain. > I like the former better as it comes more natural to approach the spider in > the same place where you tell it what to do. Both are possible. > - One would be able to dispatch jobs to spiders, both by sending start > URLs (cfr. start_urls) or sending Request objects (cfr. start_requests). > - The user would have full control over the results of these > scrapings. The standard case would be for the spider to return Items (cfr. > parse). However, the user could also opt for a more interactive approach > where he would intercept Responses as well, effectively bypassing the > regular parse method. This allows the user to approach the spider more > interactively. > - The user can choose to control the pipeline items will go through. > Pipelines are most often used for cleaning and saving to different formats. > Since the user is remote, saving might not make just as much sense as when > the user expects results to appear locally. Cleaning items however is quite > different. > - The API supports authentication. This I should look into more > properly but I would like at least support for API keys. Generally these > are strings that a user supplies to gain access to the API. These keys > could have some rules tied to them, like rated admission or max amount of > uses, expiry dates,... > - More vague brainstorm stuff, more akin to the currently existing > WebService: The user can influence CrawlerSpider Rules, i.e. get, add and > delete them. > > The API is useful for those who want to remotely access their spiders, be > this for testing, practical or demo purposes. A cool addition would > therefore be to add a small, clean and self-explanatory user web interface. > This should then allow viewing "raw" requests and responses as they are > passed and gotten from the API, but also clean representations of these > messages. This could be really basic by just supplying a visual tree-like > representation of such objects, or really advanced like allowing a user to > define widgets for how to represent each field. Again, this is just a > brainstorm and depends completely on where emphasis of the project lies. > > I'll close this monologue by taking into account the already existing > projects around Scrapy that supply similar functionality. These are Scrapyd > and WebService, at least one of which most users have already glanced at. > Scrapyd allows starting any spider, and that's basically its greatest > trump. WebService on the other hand is automatically enabled for one spider > and allows monitoring and controlling that spider, though especially the > former. This project is somewhere in between: it should preferably enable > access to multiple spiders (inside one project) at the same time, while > laying emphasis on taking interactive control. > > > On 18 February 2014 22:13, Shane Evans <[email protected]> wrote: > >> >> Thanks for the great answer, Scrapinghub looks really promising by the >>> way. Generating Parsley sounds interesting, but I feel you've basically got >>> that covered with slybot and an UI on top of that. >>> >> Sure. I think there is a lot of interesting work here, but it's not well >> defined yet. There are many cases where slybot will not do exactly what you >> want, so I like the idea of then generating python and continuing coding >> from there. It's also better than browser addons for working with xpaths >> (due to the fact it uses scrapy). >> >> >>> >>> I'm currently back to looking in the direction of an HTTP API, yet I >>> feel the project as we discussed it before is a bit immature on its own. If >>> anyone has had any uses for an HTTP API for their Scrapy spiders before >>> that required some more intricate functionality, please get back to me so >>> we could discuss how such an HTTP API could be extended beyond >>> communicating with a simple spider. In the meanwhile, I'll be looking on on >>> myself. >>> >> >> I agree, as it stands it's a bit light. I welcome some suggestions, I'll >> think about it some more too. >> >> One addition I thought about was instead of a single spider, wrap a >> project and dispatch to any spider. Either based on spider name passed, or >> have some domain -> spider mapping. This has come up before and would be >> useful. >> >> >> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "scrapy-users" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/scrapy-users/dJRFIA46MT4/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
