Thanks Shixiong, read in documentation as well that duplicates might exist
because of task retries.
On Mon, 1 Apr 2019 at 9:43 PM, Shixiong(Ryan) Zhu
wrote:
> The Kafka source doesn’t support transaction. You may see partial data or
> duplicated data if a Spark task fails.
>
> On Wed, Mar 27,
After following a tutorial on Recommender systems using Pyspark / Spark ML. I
decided to jump in with my own dataset. I am specifically trying to predict
video suggestions based on an implicit feature for the time a video was
watched. I wrote a generator to produce my dataset. I have a
Could you try to use $”a” rather than df(“a”)? The latter one sometimes
doesn’t work.
On Thu, Mar 21, 2019 at 10:41 AM kineret M wrote:
> I try to read a stream using my custom data source (v2, using spark 2.3),
> and it fails *in the second iteration* with the following exception while
>
The Kafka source doesn’t support transaction. You may see partial data or
duplicated data if a Spark task fails.
On Wed, Mar 27, 2019 at 1:15 AM hemant singh wrote:
> We are using spark batch to write Dataframe to Kafka topic. The spark
> write function with write.format(source = Kafka).
> Does
In Both the cases, I am trying to create a HIVE table based on Union on 2
same queries.
Not sure how internally it differs on the process of creation of HIVE table?
Regards,
Neeraj
On Sun, Mar 31, 2019 at 1:29 PM Jörn Franke wrote:
> Is the select taking longer or the saving to a file. You