But you can still use Stanford NLP library and distribute through spark
right !

On Sun, Nov 26, 2017 at 3:31 PM, Holden Karau <hol...@pigscanfly.ca> wrote:

> So it’s certainly doable (it’s not super easy mind you), but until the
> arrow udf release goes out it will be rather slow.
>
> On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com> wrote:
>
>> Hi,
>>
>> Has someone tried running NLTK (python) with Spark Streaming (scala)? I
>> was wondering if this is a good idea and what are the right Spark operators
>> to do this? The reason we want to try this combination is that we don't
>> want to run our transformations in python (pyspark), but after the
>> transformations, we need to run some natural language processing operations
>> and we don't want to restrict the functions data scientists' can use to
>> Spark natural language library. So, Spark streaming with NLTK looks like
>> the right option, from the perspective of fast data processing and data
>> science flexibility.
>>
>> Regards,
>> Ashish
>>
> --
> Twitter: https://twitter.com/holdenkarau
>

Reply via email to