Unsubscribe

2019-01-18 Thread Huy Banh

Re: Troubleshooting "Task not serializable" in Spark/Scala environments

2015-09-22 Thread Huy Banh
The header should be sent from driver to workers already by spark. And therefore in sparkshell it works. In scala IDE, the code inside an app class. Then you need to check if the app class is serializable. On Tue, Sep 22, 2015 at 9:13 AM Alexis Gillain < alexis.gill...@googlemail.com> wrote: >

Re: Using Spark for portfolio manager app

2015-09-20 Thread Huy Banh
Hi Thuy, You can check Rdd.lookup(). It requires the rdd is partitioned, and of course, cached in memory. Or you may consider a distributed cache like ehcache, aws elastic cache. I think an external storage is an option, too. Especially nosql databases, they can handle updates at high speed, at

Re: word count (group by users) in spark

2015-09-20 Thread Huy Banh
Hi, If your input format is user -> comment, then you could: val comments = sc.parallelize(List(("u1", "one two one"), ("u2", "three four three"))) val wordCounts = comments. flatMap({case (user, comment) => for (word <- comment.split(" ")) yield(((user, word), 1)) }).

Kr

2015-09-10 Thread Huy Banh
Ọqo