I have a Spark Streaming application which reads data from kafka and save
the the transformation result to hdfs.
My original partition number of kafka topic is 8, and repartition the data
to 100 to increase the parallelism of spark job.
Now I am wondering if I increase the kafka partition number
Hi Spark Users,
I tried to implement GBT and found that the feature Importance computed
while the model was fit is different when the same model was saved into a
storage and loaded back.
I also found that once the persistent model is loaded and saved back again
and loaded, the feature
Hi Dillon,
Thank you for your reply.
mapToPair use a PairFunction to transform format to a particular parquet
format. I have tried to replace the mapToPair() function with other action
operators like count() or collect(), but it didn't work. So I guess the
shuffle write explosion problem have no
Hi Matei,
Thanks for your answer, it's much clearer now. I was not aware about the
time needed for the release preparation.
Best regards,
Bartosz.
On Tue, Nov 6, 2018 at 9:05 AM Matei Zaharia
wrote:
> Hi Bartosz,
>
> This is because the vote on 2.4 has passed (you can see the vote thread on
>
Hi Bartosz,
This is because the vote on 2.4 has passed (you can see the vote thread on the
dev mailing list) and we are just working to get the release into various
channels (Maven, PyPI, etc), which can take some time. Expect to see an
announcement soon once that’s done.
Matei
> On Nov 4,