But you can still use Stanford NLP library and distribute through spark right !
On Sun, Nov 26, 2017 at 3:31 PM, Holden Karau <hol...@pigscanfly.ca> wrote: > So it’s certainly doable (it’s not super easy mind you), but until the > arrow udf release goes out it will be rather slow. > > On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com> wrote: > >> Hi, >> >> Has someone tried running NLTK (python) with Spark Streaming (scala)? I >> was wondering if this is a good idea and what are the right Spark operators >> to do this? The reason we want to try this combination is that we don't >> want to run our transformations in python (pyspark), but after the >> transformations, we need to run some natural language processing operations >> and we don't want to restrict the functions data scientists' can use to >> Spark natural language library. So, Spark streaming with NLTK looks like >> the right option, from the perspective of fast data processing and data >> science flexibility. >> >> Regards, >> Ashish >> > -- > Twitter: https://twitter.com/holdenkarau >