Re: Using Spark Accumulators with Structured Streaming

2020-06-07 Thread Something Something
Great. I guess the trick is to use a separate class such as 'StateUpdateTask'. I will try that. My challenge is to convert this into Scala. Will try it out & revert. Thanks for the tips. On Wed, Jun 3, 2020 at 11:56 PM ZHANG Wei wrote: > The following Java codes can work in my cluster

Re: unsubscribe

2020-06-07 Thread Wesley
please send an empty email to: user-unsubscr...@spark.apache.org for unsubscribing. thanks. unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

unsubscribe

2020-06-07 Thread Marian Zsemlye
unsubscribe please Mgr. Marian Zsemlye Lead Software Developer PROFECT Slovakia s.r.o. Prievozská 4D 821 09 Bratislava Mobile: +421 903 401 153 E-Mail: marian.zsem...@profect.sk [A close up of a sign Description automatically

unsubscribe

2020-06-07 Thread Arkadiy Ver
unsubscribe

Re: NoClassDefFoundError: scala/Product$class

2020-06-07 Thread charles_cai
The org.bdgenomics.adam is one of the Components of the GATK, and I just download the release version from its github website . However, when I build a new docker image with spark2.4.5 and scala 2.12.4,It works well and that makes me confused. root@master2:~# pyspark Python 2.7.17 (default,

Re: Spark :- Update record in partition.

2020-06-07 Thread ayan guha
Hi Please look at delta.io which is a companion open source project. It addresses the exact use case you are after. On Mon, Jun 8, 2020 at 2:35 AM Sunil Kalra wrote: > Hi All, > > If i have to update a record in partition using spark, do i have to read > the whole partition and update the row

Re: Structured Streaming using File Source - How to handle live files

2020-06-07 Thread Jungtaek Lim
Hi Nick, I guess that's by design - Spark assumes the input file will not be modified once it is placed on the input path. This makes Spark easy to track the list of processed files vs unprocessed files. Assume input files can be modified, then Spark will have to enumerate all of files and track

Structured Streaming using File Source - How to handle live files

2020-06-07 Thread ArtemisDev
We were trying to use structured streaming from file source, but had problems getting the files read by Spark properly.  We have another process generating the data files in the Spark data source directory on a continuous basis.  What we have observed was that the moment a data file is created

Spark :- Update record in partition.

2020-06-07 Thread Sunil Kalra
Hi All, If i have to update a record in partition using spark, do i have to read the whole partition and update the row and overwrite the partition? Is there a way to only update 1 row like DBMS. Otherwise 1 row update takes a long time to rewrite the whole partition ? Thanks Sunil