Re: Implementing Upsert logic Through Streaming

2019-06-30 Thread Sachit Murarka
Hi Chris, I have to make sure my DB has updated value for any record at a given point of time. Say following is data. I have to take 4th row for EmpId 2. Also if any Emp details are already there in Oracle. I have to update it with latest value in the stream. EmpId, salary, timestamp 1, 1000

Re: Implementing Upsert logic Through Streaming

2019-06-30 Thread Chris Teoh
Just thinking on this, if your needs can be addressed using batch instead of streaming, I think this is a viable solution. Using a lambda architecture approach seems like a possible solution. On Sun., 30 Jun. 2019, 9:54 am Chris Teoh, wrote: > Not sure what your needs are here. > > If you can

k8s orchestrating Spark service

2019-06-30 Thread Pat Ferrel
We're trying to setup a system that includes Spark. The rest of the services have good Docker containers and Helm charts to start from. Spark on the other hand is proving difficult. We forked a container and have tried to create our own chart but are having several problems with this. So back to

Re: Map side join without broadcast

2019-06-30 Thread Rahul Nandi
You can implement custom partitioner to do the bucketing. On Sun, Jun 30, 2019 at 5:15 AM Chris Teoh wrote: > The closest thing I can think of here is if you have both dataframes > written out using buckets. Hive uses this technique for join optimisation > such that both datasets of the same

Re: Map side join without broadcast

2019-06-30 Thread jelmer
Does something like the code below make any sense or would there be a more efficient way to do it ? val wordsOnOnePartition = input > .map { word => Math.abs(word.id.hashCode) % numPartitions -> word } > .partitionBy(new PartitionIdPassthrough(numPartitions)) > val indices =