Re: NLTK with Spark Streaming

2017-12-01 Thread ashish rawat
Thanks Nicholas, but the problem for us is that we want to use NLTK Python library, since our data scientists are training using that. Rewriting the inference logic using some other library would be time consuming and in some cases, it may not even work because of unavailability of some functions.

Re: What should LivyUrl be set to when running locally?

2017-12-01 Thread kant kodali
nvm, I see it. It's http://localhost:8998 On Fri, Dec 1, 2017 at 3:28 PM, kant kodali wrote: > Hi All, > > I am running both spark and livy locally so imagine everything on a local > machine. > what should my livyUrl be set to? I don't see that in the example. > > Thanks! >

What should LivyUrl be set to when running locally?

2017-12-01 Thread kant kodali
Hi All, I am running both spark and livy locally so imagine everything on a local machine. what should my livyUrl be set to? I don't see that in the example. Thanks!

Re: Getting Message From Structured Streaming Format Kafka

2017-12-01 Thread Daniel de Oliveira Mantovani
Hello Burak, Sorry to the delayed answer, you were right. 1) - I change the sql-kafka connector version and fixed. 2) - The propose was just test, and I was using normal streaming also for other thing. I'm was wondering how did you know was the sql-kafka connector version reading the logs. I

Re: [Spark streaming] No assigned partition error during seek

2017-12-01 Thread Cody Koeninger
Yeah, don't mix multiple versions of kafka clients. That's not 100% certain to be the cause of your problem, but it can't be helping. As for your comments about async commits, read https://issues.apache.org/jira/browse/SPARK-22486 and if you think your use case is still relevant to others

Re: Writing files to s3 with out temporary directory

2017-12-01 Thread Steve Loughran
Hadoop trunk (i.e 3.1 when it comes out), has the code to do 0-rename commits http://steveloughran.blogspot.co.uk/2017/11/subatomic.html if you want to play today, you can build Hadoop trunk & spark master, + a little glue JAR of mine to get Parquet to play properly

Re: [Spark streaming] No assigned partition error during seek

2017-12-01 Thread Qiao, Richard
In your case, it looks it’s trying to make 2 versions Kafka existed in the same JVM at runtime. There is version conflict. About “I dont find the spark async commit useful for our needs”, do you mean to say the code like below? kafkaDStream.asInstanceOf[CanCommitOffsets].commitAsync(ranges)