Hello Peng Foo, Thanks for your interest in participating in GSoC 2016 with Scrapy! Sorry for not replying earlier.
Your proposal also arrived in the mentoring dashboard on https://summerofcode.withgoogle.com We'll review it in the coming days and provide feedback if necessary. Kind regards, Paul. On Saturday, March 12, 2016 at 4:47:11 PM UTC+1, Pan Foo wrote: > > hi, all, > > i'm peng foo, a post-graduate student major in CS in BUAA,aka BeiHang > Univ. BHU, China, which is one of the top univ. of CS in China . (i know it > may be unfamiliar to you , but it's been shown in the "CSI:CYBER S2E11" :-) > if you've got interest , i'd love to tell you why my univ. changed its name > xD ) > > <https://pic3.zhimg.com/8559284960725caae8dc347cf4d7943a_b.png> > > > *I am Python skilled and a lot of experience on web scraping and web > developing.* > > > I major in ML/NLP, and Python is the most popular language in the field, > so i write python a lot, i also help my lab write some frontend work using > javascript and HTML5 and jquery. > > I am an intern at Sogou.Inc during 11, 2015 - 4,2016, and my mainly job > is using python do the web scraping job and data cleaning and data > anylasis, during the time, i write a lot of spiders xD > i've crawled big sites like yahoo/sina finance, tencent news and yahoo > news, sina weibo( chinese edition of twitter) and so on. > > I've also been interned at Lenovo and i do researching jobs on machine > learning with Python mainly about gesture identifying at 2014. i meet with > Ipython Notebook then, by the time it was ipython notebook instead of > jupyter :) > and now ipython is the first choice of writing python remotely instead of > writing python locally and use scp command to transfer the the remote > server and run it:) > > > i was amazed at what a python IDE in browser can do , and i was shocked > when i use ipython notebook and metaplotlib to draw a line gragh in the IDE > instantly! > > and i really think combine scrapy and ipython notebook together is a good > thing to do. > > followings are some thing i think i may meet up with and problems i shall > solve. > > 1, there are indeed some problems for showing HTML in HTML, such as class > conflicting and some layout showing issues because the jupyter console is > much thinner than the browser. > > 2, cross site/domain issues such as the code of ajax request and also a > jump-out-js-code. > > 3, security issues such as bad codes or some alerts. > > 4, the performance issue: what shall we do if user write too many > show-HTML code and the console may be slow. you know it is likely to happen > because people like to use jupyter to do some inline debug work because > jupyter can show the results immediately:-) > > > and followings are the ideas i've always wanted to make it happen and i > think it may fit the scrapy + ipython idea: > > 0, formatted dom tree shown in jupyter console. in web scraping , dom is > as important as HTML, and we could both show HTML page and dom tree in the > console maybe. > > 1, visualized elements selection. it's a feature of most browsers' > developer's toolkit (F12 in chrome and firebug in firefox), it is such a > good feature that every time i write a spider, i shall use it to locate the > elements, i think it would be good if we could make it happen that when i > select a node in the jupyter console ( based on idea 0 implemented ), i > could see the elements in the html page highlighted:) > > 2. xpah/cssselector generator. i've been wanted to implement this idea for > a long time xD. when i was to scrap something, i always wanted the xpath > and the cssselector shall jump out themselves when i select something in > the html page i was to scrap, a paragrah or a table or a <a> tag or a <ul> > list, and once we can show the html page in the jupyter console, it may > come true! the xpath or cssselector could be generated automatically when > elements in the page are selected and i'm sure it would help a lot! > > > > > when i first search for the python projects of gsoc 2016, i did not find > one that i'd like to participate in. but some time earlier today when i saw > scrapy in the python projects of gsoc 2016, i was so excited and i thought > in my mind "this is it!". i just want you to know that i am long for a > chance work with you and contributing codes to the scrapy project! > > if i am not good for the "ipython IDE for scrapy" idea , any other idea is > okay for me to do :-) > > > best regards, > > peng foo, > > > > > > > > > > > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
