Phrase Search using Apache Spark in huge amount of text in files

2019-05-28 Thread Sandeep Giri
Dear Spark Users, If you want to search a list of phrases, approx. 10,000 each having words between 1 to 6, in a large amount of text (approximately 10GB) how do you go about it? I ended up wiring a small RDD based libraries: https://github.com/cloudxlab/phrasesearch I would like to get feedback

Re: Writing to multiple Kafka partitions from Spark

2019-05-28 Thread Femi Anthony
Ok that worked thanks for the suggestion. Sent from my iPhone > On May 24, 2019, at 11:53 AM, SNEHASISH DUTTA > wrote: > > Hi, > All the keys are similar so they are going to same partition. > Key->Partition distribution is dependent upon hash calculation add some > random number to your key

Re: Putting record in HBase with Spark - error get regions.

2019-05-28 Thread Guillermo Ortiz Fernández
After a while it's possible to see this error too: 9/05/28 11:11:18 ERROR executor.Executor: Exception in task 35.1 in stage 0.0 (TID 265) org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 122 actions: my_table: 122 times, at

Putting record in HBase with Spark - error get regions.

2019-05-28 Thread Guillermo Ortiz Fernández
I'm executing a load process into HBase with spark. (around 150M record). At the end of the process there are a lot of fail tasks. I get this error: 19/05/28 11:02:31 ERROR client.AsyncProcess: Failed to get region location org.apache.hadoop.hbase.TableNotFoundException: my_table at

Re: 1 task per executor

2019-05-28 Thread Arnaud LARROQUE
Hi, How many files do you read ? Are they splittable ? If you have 4 files non splittable, your dataset would have 4 partitions and you will only see one task per partition handle by on executor Regards, Arnaud On Tue, May 28, 2019 at 10:06 AM Sachit Murarka wrote: > Hi All, > > I am using

1 task per executor

2019-05-28 Thread Sachit Murarka
Hi All, I am using spark 2.2 I have enabled spark dynamic allocation with executor cores 4, driver cores 4 and executor memory 12GB driver memory 10GB. In Spark UI, I see only 1 task is launched per executor. Could anyone please help on this? Kind Regards, Sachit Murarka