Hi there,

I'm looking for a Scrapy developer who can help me with the following 
project. I'm based in Chicago. This would be a fixed project or hourly 
assignment based on estimate. If you know someone who would be interested, 
please let me know. A portion of the crawler is already built. Thanks!

1. CRAWL FOR COMPANY SOCIAL HANDLES - Populate the database with social 
feeds and qualified media from a given company URL.

The social feeds or handles include:

LinkedIn, YouTube, Vimeo, Twitter, Facebook, Google+, Blog RSS


2) CRAWL FOR EMBEDDED FILES

The media types include: 

A. Raw video formats, all types (.MP4, .MOV, etc.)

B. Embedded video (Vimeo, Youtube)

C. Embedded documents (slideshare, scribD)

D. Raw documents (pdf)

3) CONTENT IMPORT FROM SOCIAL APIs - For embedded documents (YouTube, 
Vimeo, Slideshare, scribD) extracting additional videos and documents 
through the discovered social feed and using the API + discovered handle to 
retrieve additional content, while checking for duplicates from the site 
crawl.

4) UPDATE WEBSITE MEDIA SCHEDULED CRON - Update the database with any new 
media (including title, description, date created, url) from the company 
source URL as part of an overnight scheduled cron while checking for 
duplicates.

5) FILE META ENRICHMENT - Third, using crocodoc text extraction (already 
built) to enrich files metadata in the Vendori file record i.e. tags that 
match to brands, solutions, channels that are pre-defined in the Vendori 
database.

6) CONTENT CONVERSION CRON - Routinely polls the database for remote file 
URLs and creates a copy using Wistia (video) and Crocodoc (PDFs, 
Slideshare, ScribD)

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to