What benefits do we really get out of colocation?

2016-12-02 Thread kant kodali
I wonder what benefits do I really I get If I colocate my spark worker process and Cassandra server process on each node? I understand the concept of moving compute towards the data instead of moving data towards computation but It sounds more like one is trying to optimize for network latency.

Re: Spark sql generated dynamically

2016-12-02 Thread Georg Heiler
Are you sure? I think this is a column wise and not a row wise operation. ayan guha schrieb am Fr. 2. Dez. 2016 um 15:17: > You are looking for window functions. > On 2 Dec 2016 22:33, "Georg Heiler" wrote: > > Hi, > > how can I perform a group

Re: TallSkinnyQR

2016-12-02 Thread Iman Mohtashemi
Thanks again! This is very helpful! Best regards, Iman On Dec 2, 2016 2:49 PM, "Huamin Li" <3eri...@gmail.com> wrote: > Hi Iman, > > You can get my code from https://github.com/hl475/svd/tree/testSVD. In > additional to fix the index issue for IndexedRowMatrix ( >

Re: TallSkinnyQR

2016-12-02 Thread Huamin Li
Hi Iman, You can get my code from https://github.com/hl475/svd/tree/testSVD. In additional to fix the index issue for IndexedRowMatrix ( https://issues.apache.org/jira/browse/SPARK-8614), I have made some the following changes as well: (1) Add tallSkinnySVD and computeSVDbyGram to

RDD getPartitions() size and HashPartitioner numPartitions

2016-12-02 Thread Amit Sela
This might be a silly question, but I wanted to make sure, when implementing my own RDD, if using a HashPartitioner as the RDD's partitioner the number of partitions returned by the implementation of getPartitions() has to match the number of partitions set in the HashPartitioner, correct ?

Re: TallSkinnyQR

2016-12-02 Thread Iman Mohtashemi
Great thanks! Where can I get the latest with the bug fixes? best regards, Iman On Fri, Dec 2, 2016 at 10:54 AM Huamin Li <3eri...@gmail.com> wrote: > Hi, > > There seems to be a bug in the section of code that converts the RowMatrix > format back into indexedRowMatrix format. > > For RowMatrix,

Re: TallSkinnyQR

2016-12-02 Thread Huamin Li
Hi, There seems to be a bug in the section of code that converts the RowMatrix format back into indexedRowMatrix format. For RowMatrix, I think the singular values and right singular vectors (not the left singular vectors U) that computeSVD computes are correct when using multiple

Re: Kafka 0.10 & Spark Streaming 2.0.2

2016-12-02 Thread Jacek Laskowski
Hi, What's the entire spark-submit + Spark properties you're using? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Dec 2, 2016 at 6:28 PM, Gabriel

Re: RDD flatmap to multiple key/value pairs

2016-12-02 Thread Michal Šenkýř
Hello im281, The transformations equivalent to the first mapper would look like this in Java: .flatMap(line -> Arrays.asList(line.split(" ")).iterator()) .filter(word -> Character.isUpperCase(word.charAt(0))) .mapToPair(word -> new Tuple2<>(word, 1)) The second mapper would look more

Re: Kafka 0.10 & Spark Streaming 2.0.2

2016-12-02 Thread Gabriel Perez
I had it setup with three nodes, a master and 2 slaves. Is there anything that would tell me it was in local mode. I am also added the –deploy-mode cluster flag and saw the same results. Thanks, Gabe From: Mich Talebzadeh Date: Friday, December 2, 2016 at 12:26 PM

Re: Kafka 0.10 & Spark Streaming 2.0.2

2016-12-02 Thread Mich Talebzadeh
in this POC of yours are you running this app with spark in Local mode by any chance? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Kafka 0.10 & Spark Streaming 2.0.2

2016-12-02 Thread Gabriel Perez
We actually ended up reverting back to 0.9.0 in my testing environment because we found other products weren’t ready to go for 0.10 as well. So I am not able to create those snapshots. Hopefully I don’t see the same issue with 0.9.0. Thank you for your help thought. Thanks, Gabe From: Jacek

Re: Kafka 0.10 & Spark Streaming 2.0.2

2016-12-02 Thread Jacek Laskowski
Hi, Can you post the screenshot of the Executors and Streaming tabs? Jacek On 2 Dec 2016 5:54 p.m., "Gabriel Perez" wrote: > Hi, > > > > The total partitions are 128 and I can tell its one executor because in > the consumer list for kafka I see only one thread pulling

Re: Kafka 0.10 & Spark Streaming 2.0.2

2016-12-02 Thread Gabriel Perez
Hi, The total partitions are 128 and I can tell its one executor because in the consumer list for kafka I see only one thread pulling and in the master spark UI I see the executor thread id is showing as 0 and that’s it. Thanks, Gabe From: Jacek Laskowski Date: Friday,

Re: Kafka 0.10 & Spark Streaming 2.0.2

2016-12-02 Thread Jacek Laskowski
Hi, How many partitions does the topic have? How do you check how many executors read from the topic? Jacek On 2 Dec 2016 2:44 p.m., "gabrielperez2484" wrote: Hello, I am trying to perform a POC between Kafka 0.10 and Spark 2.0.2. Currently I am running into an

Re: TallSkinnyQR

2016-12-02 Thread Iman Mohtashemi
I have a different question that might be trivial for you (although not to me :)) Maybe you can answer this? Here is a MapReduce Example implemented in Java. It reads each line of text and for each word in the line of text determines if it starts with an upper case. If so, it creates a key value

Re: TallSkinnyQR

2016-12-02 Thread Iman Mohtashemi
Ok thanks. On Fri, Dec 2, 2016 at 8:19 AM Sean Owen wrote: > I tried, but enforcing the ordering changed a fair bit of behavior and I > gave up. I think the way to think of it is: a RowMatrix has whatever > ordering you made it with, so you need to give it ordered rows if

Re: TallSkinnyQR

2016-12-02 Thread Sean Owen
I tried, but enforcing the ordering changed a fair bit of behavior and I gave up. I think the way to think of it is: a RowMatrix has whatever ordering you made it with, so you need to give it ordered rows if you're going to use a method like the QR decomposition. That works. I don't think the QR

Re: TallSkinnyQR

2016-12-02 Thread Iman Mohtashemi
Hi guys, Was this bug ever resolved? Iman On Fri, Nov 11, 2016 at 9:59 AM Iman Mohtashemi wrote: > Yes this would be helpful, otherwise the Q part of the decomposition is > useless. One can use that to solve the system by transposing it and > multiplying with b and

RDD flatmap to multiple key/value pairs

2016-12-02 Thread im281
Here is a MapReduce Example implemented in Java. It reads each line of text and for each word in the line of text determines if it starts with an upper case. If so, it creates a key value pair public class CountUppercaseMapper extends Mapper { @Override

Re: Spark sql generated dynamically

2016-12-02 Thread ayan guha
You are looking for window functions. On 2 Dec 2016 22:33, "Georg Heiler" wrote: > Hi, > > how can I perform a group wise operation in spark more elegant? Possibly > dynamically generate SQL? Or would you suggest a custom UADF? >

Kafka 0.10 & Spark Streaming 2.0.2

2016-12-02 Thread gabrielperez2484
Hello, I am trying to perform a POC between Kafka 0.10 and Spark 2.0.2. Currently I am running into an issue, where only one executor ("kafka consumer") is reading from the topic. Which is causing performance to be really poor. I have tried adding "--num-executors 8" both in the script to execute

Spark sql generated dynamically

2016-12-02 Thread Georg Heiler
Hi, how can I perform a group wise operation in spark more elegant? Possibly dynamically generate SQL? Or would you suggest a custom UADF? http://stackoverflow.com/q/40930003/2587904 Kind regards, Georg

Re: Spark 2.x Pyspark Spark SQL createDataframe Error

2016-12-02 Thread Vinayak Joshi5
Thanks Michal. I have submitted a Spark issue and PR based on my understanding of why this changed in Spark 2.0. If interested you can follow it on https://issues.apache.org/jira/browse/SPARK-18687 Regards, Vinayak. From: Michal Šenkýř To: Vinayak

Re: Accessing log for lost executors

2016-12-02 Thread Benyi Wang
Usually your executors were killed by YARN due to exceeding the memory. You can change NodeManager's log to see if your application got killed. or use command "yarn logs -applicationId " to download the logs. On Thu, Dec 1, 2016 at 10:30 PM, Nisrina Luthfiyati < nisrina.luthfiy...@gmail.com>