Re: Use case advice

2021-01-20 Thread purav aggarwal
Unsubscribe On Fri, Jan 15, 2021 at 9:52 AM Dilip Desavali wrote: > Unsubscribe >

Re: Use case advice

2021-01-14 Thread Dilip Desavali
Unsubscribe

Re: Use case advice

2021-01-14 Thread muru
You need to make sure the delta-core_2.11-0.6.1. jar file in your $SPARK_HOME/jars folder. On Thu, Jan 14, 2021 at 4:59 AM András Kolbert wrote: > sorry missed out a bit. Added, highlighted with yellow. > > On Thu, 14 Jan 2021 at 13:54, András Kolbert > wrote: > >> Thanks, Muru, very helpful

Re: Use case advice

2021-01-14 Thread András Kolbert
sorry missed out a bit. Added, highlighted with yellow. On Thu, 14 Jan 2021 at 13:54, András Kolbert wrote: > Thanks, Muru, very helpful suggestion! Delta Lake is amazing, completely > changed a few of my projects! > > One question regarding that. > When I use the following statement, all works

Re: Use case advice

2021-01-14 Thread András Kolbert
Thanks, Muru, very helpful suggestion! Delta Lake is amazing, completely changed a few of my projects! One question regarding that. When I use the following statement, all works fine and I can use delta properly, in the spark context that jupyter initiates automatically. export

Re: Use case advice

2021-01-09 Thread muru
You could try Delta Lake or Apache Hudi for this use case. On Sat, Jan 9, 2021 at 12:32 PM András Kolbert wrote: > Sorry if my terminology is misleading. > > What I meant under driver only is to use a local pandas dataframe (collect > the data to the master), and keep updating that instead of

Re: Use case advice

2021-01-09 Thread András Kolbert
Sorry if my terminology is misleading. What I meant under driver only is to use a local pandas dataframe (collect the data to the master), and keep updating that instead of dealing with a spark distributed dataframe for holding this data. For example, we have a dataframe with all users and their

Re: Use case advice

2021-01-09 Thread Artemis User
Could you please clarify what do you mean by 1)? Driver is only responsible for submitting Spark job, not performing. -- ND On 1/9/21 9:35 AM, András Kolbert wrote: Hi, I would like to get your advice on my use case. I have a few spark streaming applications where I need to keep updating a

Use case advice

2021-01-09 Thread András Kolbert
Hi, I would like to get your advice on my use case. I have a few spark streaming applications where I need to keep updating a dataframe after each batch. Each batch probably affects a small fraction of the dataframe (5k out of 200k records). The options I have been considering so far: 1) keep