Hi Adrian, Thanks for the briefing of the support problem of Sparks. We could always ask help from the community by providing enough context information. Maybe we can add a page in the zipkin wiki page and write twitter about it, to see if we could attract some contributors. >From my experience, if we have a great idea, it won't take a long time to find help from the open source community.
Regards, Willem Jiang Twitter: willemjiang Weibo: 姜宁willem On Tue, Mar 19, 2019 at 8:51 AM Adrian Cole <[email protected]> wrote: > > Hi, team. > > A long time ago, we arbitrarily used spark for dependency link > aggregation (porting the work from Eirik's hadoop job). The initial > spark job was created incomplete then abandoned by the author. I've > tried a lot to support it, but it has been perpetual maintenance and > most of us have no idea how to support it. Yet, we get a lot of user > questions about it and the support load is higher than most of our > projects. > > The Elasticsearch part is landmines from the "wan only" stuff, to them > having a narrow supported range of versions. It is rev-locked to a JRE > (even if will change later). We've had users complain about CVE > maintenance and actively ask for a non-spark option. General support > comes in questions about cluster distribution which no-one knows the > answer to. I've recently in desperation added a change to help show > where Spark support is. > > https://github.com/openzipkin/zipkin-dependencies/pull/133 > > All this said, despite the problems running distributed or with > elasticsearch, most can start the zipkin-dependencies job as a > one-shot cron job without much help. > > I think we have to be honest about the fact that since this project > started, we've rarely had anyone able to support it. I hope we can get > out of the mutually disappointing support swamp. Does anyone have any > ideas? > > I would like to think someone could come in and save us, but seems we > should also consider other tools as that usually doesn't happen, and > one person saving us isn't sustainable (usually we need a few people > to know a tool in order to realistically support it). It is possible > to recruit for this, but we need significant close buy-in from people > who know spark imho, like actually helping with support, if we want to > continue this path. > > I know there's a Kafka streaming option [1]. I also know some have > used Flink, and some have had interest in Pulsar. I think we should > have streaming options, but fact is many don't use any buffer like > Kafka (direct http), which leads me to think we still need an > after-the-fact option (pull from storage). Moreover spark's embedded > mode is nice as it can be treated as a dumb cron job. > > Looking for ideas, > -A > > [1] https://github.com/sysco-middleware/zipkin-dependencies-streaming > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
