If there is a spark ingestion option, would you be open to move away from
hadoop or there are other factors that might prevent a move?
Regards,
Will Product@Imply
On Mon, Aug 29, 2022 at 8:15 AM Will Lauer
wrote:
> @Abhishek, I haven't spoken with our Hadoop team recently about Hadoop3
> stabil
@Abhishek, I haven't spoken with our Hadoop team recently about Hadoop3
stability, so I can't say for sure, but I understand the need to migrate
and all the dependency headaches involved in NOT migrating. At this point,
I expect druid moving to hadoop3 makes sense. I suspect that _we_ won't be
read
Gian mentioned MSQ. The new MSQ work is exciting and powerful for Druid
ingestion. If the data needs cleaning, we would expect users to employ
something like Spark to do that task, then emit clean data to Kafka or files,
which Druid MSQ can ingest. That is:
Dirty data —> Spark —> Kafka/Files —>
Hi Julian,
Thank you so much for your contribution on Spark support. As an existing
committer, I would like to help get the Spark connector merged into OSS
(including PR reviews and any other development work that may be needed).
We can move the conversation regarding Spark support into a new thre
For Spark support, the connector I wrote remains functional but I haven’t
updated the PR for six months or so since it didn’t seem like there was an
appetite for review. If that’s changing I could migrate back some more recent
changes to the OSS PR. Even with an up-to-date patch though I see two
Yes. We should deprecate it first which is similar to dropping the support
(no more active development) but we will still ship it for a release or
two. In a way, we are already in that mode to a certain extent. Many
features are being built with native ingestion as a first-class citizen.
E.g. range
It's always good to deprecate things for some time prior to removing them,
so we don't need to (nor should we) remove Hadoop 2 support right now. My
vote is that in this upcoming release, we should deprecate it. The main
problem in my eyes is the one Abhishek brought up: the dependency
management s
I was thinking that moving from Hadoop 2 to Hadoop 3 will be a
low-resistance path than moving from Hadoop to Spark. even if we get that
PR merged, it will take good time for spark integration to reach the same
level of maturity as Hadoop or Native ingestion. BTW I am not making an
argument against
I am sure there are other companies out there who are still on Hadoop 2.x
with migration to Hadoop 3.x being a no-go.
If Druid was to drop support for Hadoop 3.x completely, I am afraid it
would prevent users from updating to newer versions of Druid which would be
a shame.
FWIW, we have found in p
Reviving this conversation again.
@Will - Do you still have concerns about HDFS stability? Hadoop 3 has been
around for some time now and is very stable as far as I know.
The dependencies coming from Hadoop 2 are also old enough that they cause
dependency scans to fail. E.g. Log4j 1.x dependencies
10 matches
Mail list logo