Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-24 Thread DB Tsai
+1 on exposing the APIs for columnar processing support. I understand that the scope of this SPIP doesn't cover AI / ML use-cases. But I saw a good performance gain when I converted data from rows to columns to leverage on SIMD architectures in a POC ML application. With the exposed columnar

Re: dynamic allocation manager in SS

2019-05-24 Thread Stavros Kontopoulos
Btw the heuristics for batch mode ( https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L289) vs streaming (

Re: dynamic allocation manager in SS

2019-05-24 Thread Stavros Kontopoulos
I am on k8s where there is no support yet afaik, there is wip wrt the shuffle service. So from your experience there are no issues with using the batch dynamic allocation version like there was before with dstreams as described in the related jira? Στις Παρ, 24 Μαΐ 2019, 8:28 μ.μ. ο χρήστης Gabor

Re: dynamic allocation manager in SS

2019-05-24 Thread Stavros Kontopoulos
Yes nothing happens. In this case it could propagate info to the resource manager to scale down the number of executors no? Just a thought. Στις Παρ, 24 Μαΐ 2019, 7:17 μ.μ. ο χρήστης Gabor Somogyi < gabor.g.somo...@gmail.com> έγραψε: > Structured Streaming works differently. If no data arrives

Re: dynamic allocation manager in SS

2019-05-24 Thread Gabor Somogyi
It scales down with yarn. Not sure how you've tested. On Fri, 24 May 2019, 19:10 Stavros Kontopoulos, < stavros.kontopou...@lightbend.com> wrote: > Yes nothing happens. In this case it could propagate info to the resource > manager to scale down the number of executors no? Just a thought. > >

Re: dynamic allocation manager in SS

2019-05-24 Thread Gabor Somogyi
Structured Streaming works differently. If no data arrives no tasks are executed (just had a case in this area). BR, G On Fri, 24 May 2019, 18:14 Stavros Kontopoulos, < stavros.kontopou...@lightbend.com> wrote: > Hi, > > Some while ago the streaming dynamic allocation part was added in

dynamic allocation manager in SS

2019-05-24 Thread Stavros Kontopoulos
Hi, Some while ago the streaming dynamic allocation part was added in DStreams( https://issues.apache.org/jira/browse/SPARK-12133) to improve the issues with the batch based one. Should this be ported to structured streaming? Thoughts? AFAIK there is no support in SS for it. Best, Stavros

Custom datasource: when acquire and release a lock?

2019-05-24 Thread Abhishek Somani
Hi experts, I am trying to create a custom Spark Datasource(v1) to read from a transactional data endpoint, and I need to acquire a lock with the endpoint before fetching data and release the lock after reading. Note that the lock acquisition and release needs to happen in the Driver JVM. I have

subscribe

2019-05-24 Thread Sarvesh N