Parallel dynamic partitioning producing duplicated data

2016-11-30 Thread Mehdi Ben Haj Abbes
Hi Folks, I have a spark job reading a csv file into a dataframe. I register that dataframe as a tempTable then I’m writing that dataframe/tempTable to hive external table (using parquet format for storage) I’m using this kind of command : hiveContext.sql(*"INSERT INTO TABLE t

Re: equalTo isin not working as expected with a constructed column with DataFrames

2016-02-18 Thread Mehdi Ben Haj Abbes
Hi, I forgot to mention that I'm using the 1.5.1 version. Regards, On Thu, Feb 18, 2016 at 4:20 PM, Mehdi Ben Haj Abbes <mehdi.ab...@gmail.com> wrote: > Hi folks, > > I have DataFrame with let's say this schema : > -dealId, > -ptf, > -ts > from it I derive another

Re: Number of batches in the Streaming Statics visualization screen

2016-01-29 Thread Mehdi Ben Haj Abbes
> > Regards, > - Terry > > On Fri, Jan 29, 2016 at 5:45 PM, Mehdi Ben Haj Abbes < > mehdi.ab...@gmail.com> wrote: > >> Hi folks, >> >> I have a streaming job running for more than 24 hours. It seems that >> there is a limit on the number of the b

Number of batches in the Streaming Statics visualization screen

2016-01-29 Thread Mehdi Ben Haj Abbes
Saturday. I will have the batches that have run the previous 24 hours and today it was like only the previous 3 hours. Any help will be very appreciated. -- Mehdi BEN HAJ ABBES