Re: pyspark+spacy throwing pickling exception

2018-02-15 Thread Selvam Raman
Hi , i solved the issue when i extract the method into another class. Failure: Class extract.py - contains the whole implementation. Because of this single class driver trying to serialize spacy(english) object and sending to executor. There i am facing pickling exception. Success: Class

Re: pyspark+spacy throwing pickling exception

2018-02-15 Thread Holden Karau
So you left out the exception. On one hand I’m also not sure how well spacy serializes, so to debug this I would start off by moving the nlp = inside of my function and see if it still fails. On Thu, Feb 15, 2018 at 9:08 PM Selvam Raman wrote: > import spacy > > nlp =

pyspark+spacy throwing pickling exception

2018-02-15 Thread Selvam Raman
import spacy nlp = spacy.load('en') def getPhrases(content): phrases = [] doc = nlp(str(content)) for chunks in doc.noun_chunks: phrases.append(chunks.text) return phrases the above function will retrieve the noun phrases from the content and return list of phrases.