Hi there, I'm looking for a Scrapy developer who can help me with the following project. I'm based in Chicago. This would be a fixed project or hourly assignment based on estimate. If you know someone who would be interested, please let me know. A portion of the crawler is already built. Thanks!
1. CRAWL FOR COMPANY SOCIAL HANDLES - Populate the database with social feeds and qualified media from a given company URL. The social feeds or handles include: LinkedIn, YouTube, Vimeo, Twitter, Facebook, Google+, Blog RSS 2) CRAWL FOR EMBEDDED FILES The media types include: A. Raw video formats, all types (.MP4, .MOV, etc.) B. Embedded video (Vimeo, Youtube) C. Embedded documents (slideshare, scribD) D. Raw documents (pdf) 3) CONTENT IMPORT FROM SOCIAL APIs - For embedded documents (YouTube, Vimeo, Slideshare, scribD) extracting additional videos and documents through the discovered social feed and using the API + discovered handle to retrieve additional content, while checking for duplicates from the site crawl. 4) UPDATE WEBSITE MEDIA SCHEDULED CRON - Update the database with any new media (including title, description, date created, url) from the company source URL as part of an overnight scheduled cron while checking for duplicates. 5) FILE META ENRICHMENT - Third, using crocodoc text extraction (already built) to enrich files metadata in the Vendori file record i.e. tags that match to brands, solutions, channels that are pre-defined in the Vendori database. 6) CONTENT CONVERSION CRON - Routinely polls the database for remote file URLs and creates a copy using Wistia (video) and Crocodoc (PDFs, Slideshare, ScribD) -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
