+1 to add these 3 formast into dist, under the lib/ directory. This is a worth trying step toward better usability for SQL users. They don't have *any* dependencies and very small, so I think it's safe to add them.
Best, Jark On Fri, 5 Jun 2020 at 11:14, Jingsong Li <jingsongl...@gmail.com> wrote: > Hi all, > > Considering that 1.11 will be released soon, what about my previous > proposal? Put flink-csv, flink-json and flink-avro under lib. > These three formats are very small and no third party dependence, and they > are widely used by table users. > > Best, > Jingsong Lee > > On Tue, May 12, 2020 at 4:19 PM Jingsong Li <jingsongl...@gmail.com> > wrote: > > > Thanks for your discussion. > > > > Sorry to start discussing another thing: > > > > The biggest problem I see is the variety of problems caused by users' > lack > > of format dependency. > > As Aljoscha said, these three formats are very small and no third party > > dependence, and they are widely used by table users. > > Actually, we don't have any other built-in table formats now... In total > > 151K... > > > > 73K flink-avro-1.10.0.jar > > 36K flink-csv-1.10.0.jar > > 42K flink-json-1.10.0.jar > > > > So, Can we just put them into "lib/" or flink-table-uber? > > It not solve all problems and maybe it is independent of "fat" and > "slim". > > But also improve usability. > > What do you think? Any objections? > > > > Best, > > Jingsong Lee > > > > On Mon, May 11, 2020 at 5:48 PM Chesnay Schepler <ches...@apache.org> > > wrote: > > > >> One downside would be that we're shipping more stuff when running on > >> YARN for example, since the entire plugins directory is shiped by > default. > >> > >> On 17/04/2020 16:38, Stephan Ewen wrote: > >> > @Aljoscha I think that is an interesting line of thinking. the > swift-fs > >> may > >> > be rarely enough used to move it to an optional download. > >> > > >> > I would still drop two more thoughts: > >> > > >> > (1) Now that we have plugins support, is there a reason to have a > >> metrics > >> > reporter or file system in /opt instead of /plugins? They don't spoil > >> the > >> > class path any more. > >> > > >> > (2) I can imagine there still being a desire to have a "minimal" > docker > >> > file, for users that want to keep the container images as small as > >> > possible, to speed up deployment. It is fine if that would not be the > >> > default, though. > >> > > >> > > >> > On Fri, Apr 17, 2020 at 12:16 PM Aljoscha Krettek < > aljos...@apache.org> > >> > wrote: > >> > > >> >> I think having such tools and/or tailor-made distributions can be > nice > >> >> but I also think the discussion is missing the main point: The > initial > >> >> observation/motivation is that apparently a lot of users (Kurt and I > >> >> talked about this) on the chinese DingTalk support groups, and other > >> >> support channels have problems when first using the SQL client > because > >> >> of these missing connectors/formats. For these, having additional > tools > >> >> would not solve anything because they would also not take that extra > >> >> step. I think that even tiny friction should be avoided because the > >> >> annoyance from it accumulates of the (hopefully) many users that we > >> want > >> >> to have. > >> >> > >> >> Maybe we should take a step back from discussing the "fat"/"slim" > idea > >> >> and instead think about the composition of the current dist. As > >> >> mentioned we have these jars in opt/: > >> >> > >> >> 17M flink-azure-fs-hadoop-1.10.0.jar > >> >> 52K flink-cep-scala_2.11-1.10.0.jar > >> >> 180K flink-cep_2.11-1.10.0.jar > >> >> 746K flink-gelly-scala_2.11-1.10.0.jar > >> >> 626K flink-gelly_2.11-1.10.0.jar > >> >> 512K flink-metrics-datadog-1.10.0.jar > >> >> 159K flink-metrics-graphite-1.10.0.jar > >> >> 1.0M flink-metrics-influxdb-1.10.0.jar > >> >> 102K flink-metrics-prometheus-1.10.0.jar > >> >> 10K flink-metrics-slf4j-1.10.0.jar > >> >> 12K flink-metrics-statsd-1.10.0.jar > >> >> 36M flink-oss-fs-hadoop-1.10.0.jar > >> >> 28M flink-python_2.11-1.10.0.jar > >> >> 22K flink-queryable-state-runtime_2.11-1.10.0.jar > >> >> 18M flink-s3-fs-hadoop-1.10.0.jar > >> >> 31M flink-s3-fs-presto-1.10.0.jar > >> >> 196K flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar > >> >> 518K flink-sql-client_2.11-1.10.0.jar > >> >> 99K flink-state-processor-api_2.11-1.10.0.jar > >> >> 25M flink-swift-fs-hadoop-1.10.0.jar > >> >> 160M opt > >> >> > >> >> The "filesystem" connectors ar ethe heavy hitters, there. > >> >> > >> >> I downloaded most of the SQL connectors/formats and this is what I > got: > >> >> > >> >> 73K flink-avro-1.10.0.jar > >> >> 36K flink-csv-1.10.0.jar > >> >> 55K flink-hbase_2.11-1.10.0.jar > >> >> 88K flink-jdbc_2.11-1.10.0.jar > >> >> 42K flink-json-1.10.0.jar > >> >> 20M flink-sql-connector-elasticsearch6_2.11-1.10.0.jar > >> >> 2.8M flink-sql-connector-kafka_2.11-1.10.0.jar > >> >> 24M sql-connectors-formats > >> >> > >> >> We could just add these to the Flink distribution without blowing it > up > >> >> by much. We could drop any of the existing "filesystem" connectors > from > >> >> opt and add the SQL connectors/formats and not change the size of > Flink > >> >> dist. So maybe we should do that instead? > >> >> > >> >> We would need some tooling for the sql-client shell script to pick-up > >> >> the connectors/formats up from opt/ because we don't want to add them > >> to > >> >> lib/. We're already doing that for finding the flink-sql-client jar, > >> >> which is also not in lib/. > >> >> > >> >> What do you think? > >> >> > >> >> Best, > >> >> Aljoscha > >> >> > >> >> On 17.04.20 05:22, Jark Wu wrote: > >> >>> Hi, > >> >>> > >> >>> I like the idea of web tool to assemble fat distribution. And the > >> >>> https://code.quarkus.io/ looks very nice. > >> >>> All the users need to do is just select what he/she need (I think > this > >> >> step > >> >>> can't be omitted anyway). > >> >>> We can also provide a default fat distribution on the web which > >> default > >> >>> selects some popular connectors. > >> >>> > >> >>> Best, > >> >>> Jark > >> >>> > >> >>> On Fri, 17 Apr 2020 at 02:29, Rafi Aroch <rafi.ar...@gmail.com> > >> wrote: > >> >>> > >> >>>> As a reference for a nice first-experience I had, take a look at > >> >>>> https://code.quarkus.io/ > >> >>>> You reach this page after you click "Start Coding" at the project > >> >> homepage. > >> >>>> Rafi > >> >>>> > >> >>>> > >> >>>> On Thu, Apr 16, 2020 at 6:53 PM Kurt Young <ykt...@gmail.com> > wrote: > >> >>>> > >> >>>>> I'm not saying pre-bundle some jars will make this problem go > away, > >> and > >> >>>>> you're right that only hides the problem for > >> >>>>> some users. But what if this solution can hide the problem for 90% > >> >> users? > >> >>>>> Would't that be good enough for us to try? > >> >>>>> > >> >>>>> Regarding to would users following instructions really be such a > big > >> >>>>> problem? > >> >>>>> I'm afraid yes. Otherwise I won't answer such questions for at > >> least a > >> >>>>> dozen times and I won't see such questions coming > >> >>>>> up from time to time. During some periods, I even saw such > questions > >> >>>> every > >> >>>>> day. > >> >>>>> > >> >>>>> Best, > >> >>>>> Kurt > >> >>>>> > >> >>>>> > >> >>>>> On Thu, Apr 16, 2020 at 11:21 PM Chesnay Schepler < > >> ches...@apache.org> > >> >>>>> wrote: > >> >>>>> > >> >>>>>> The problem with having a distribution with "popular" stuff is > >> that it > >> >>>>>> doesn't really *solve* a problem, it just hides it for users who > >> fall > >> >>>>>> into these particular use-cases. > >> >>>>>> Move out of it and you once again run into exact same problems > >> >>>> out-lined. > >> >>>>>> This is exactly why I like the tooling approach; you have to deal > >> with > >> >>>> it > >> >>>>>> from the start and transitioning to a custom use-case is easier. > >> >>>>>> > >> >>>>>> Would users following instructions really be such a big problem? > >> >>>>>> I would expect that users generally know *what *they need, just > not > >> >>>>>> necessarily how it is assembled correctly (where do get which > jar, > >> >>>> which > >> >>>>>> directory to put it in). > >> >>>>>> It seems like these are exactly the problem this would solve? > >> >>>>>> I just don't see how moving a jar corresponding to some feature > >> from > >> >>>> opt > >> >>>>>> to some directory (lib/plugins) is less error-prone than just > >> >> selecting > >> >>>>> the > >> >>>>>> feature and having the tool handle the rest. > >> >>>>>> > >> >>>>>> As for re-distributions, it depends on the form that the tool > would > >> >>>> take. > >> >>>>>> It could be an application that runs locally and works against > >> maven > >> >>>>>> central (note: not necessarily *using* maven); this should would > >> work > >> >>>> in > >> >>>>>> China, no? > >> >>>>>> > >> >>>>>> A web tool would of course be fancy, but I don't know how > feasible > >> >> this > >> >>>>> is > >> >>>>>> with the ASF infrastructure. > >> >>>>>> You wouldn't be able to mirror the distribution, so the load > can't > >> be > >> >>>>>> distributed. I doubt INFRA would like this. > >> >>>>>> > >> >>>>>> Note that third-parties could also start distributing use-case > >> >> oriented > >> >>>>>> distributions, which would be perfectly fine as far as I'm > >> concerned. > >> >>>>>> > >> >>>>>> On 16/04/2020 16:57, Kurt Young wrote: > >> >>>>>> > >> >>>>>> I'm not so sure about the web tool solution though. The concern I > >> have > >> >>>>> for > >> >>>>>> this approach is the final generated > >> >>>>>> distribution is kind of non-deterministic. We might generate too > >> many > >> >>>>>> different combinations when user trying to > >> >>>>>> package different types of connector, format, and even maybe > hadoop > >> >>>>>> releases. As far as I can tell, most open > >> >>>>>> source projects and apache projects will only release some > >> >>>>>> pre-defined distributions, which most users are already > >> >>>>>> familiar with, thus hard to change IMO. And I also have went > >> through > >> >> in > >> >>>>>> some cases, users will try to re-distribute > >> >>>>>> the release package, because of the unstable network of apache > >> website > >> >>>>> from > >> >>>>>> China. In web tool solution, I don't > >> >>>>>> think this kind of re-distribution would be possible anymore. > >> >>>>>> > >> >>>>>> In the meantime, I also have a concern that we will fall back > into > >> our > >> >>>>> trap > >> >>>>>> again if we try to offer this smart & flexible > >> >>>>>> solution. Because it needs users to cooperate with such > mechanism. > >> >> It's > >> >>>>>> exactly the situation what we currently fell > >> >>>>>> into: > >> >>>>>> 1. We offered a smart solution. > >> >>>>>> 2. We hope users will follow the correct instructions. > >> >>>>>> 3. Everything will work as expected if users followed the right > >> >>>>>> instructions. > >> >>>>>> > >> >>>>>> In reality, I suspect not all users will do the second step > >> correctly. > >> >>>>> And > >> >>>>>> for new users who only trying to have a quick > >> >>>>>> experience with Flink, I would bet most users will do it wrong. > >> >>>>>> > >> >>>>>> So, my proposal would be one of the following 2 options: > >> >>>>>> 1. Provide a slim distribution for advanced product users and > >> provide > >> >> a > >> >>>>>> distribution which will have some popular builtin jars. > >> >>>>>> 2. Only provide a distribution which will have some popular > builtin > >> >>>> jars. > >> >>>>>> If we are trying to reduce the distributions we released, I would > >> >>>> prefer > >> >>>>> 2 > >> >>>>>> 1. > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> Kurt > >> >>>>>> > >> >>>>>> > >> >>>>>> On Thu, Apr 16, 2020 at 9:33 PM Till Rohrmann < > >> trohrm...@apache.org> > >> >> < > >> >>>>> trohrm...@apache.org> wrote: > >> >>>>>> > >> >>>>>> I think what Chesnay and Dawid proposed would be the ideal > >> solution. > >> >>>>>> Ideally, we would also have a nice web tool for the website which > >> >>>>> generates > >> >>>>>> the corresponding distribution for download. > >> >>>>>> > >> >>>>>> To get things started we could start with only supporting to > >> >>>>>> download/creating the "fat" version with the script. The fat > >> version > >> >>>>> would > >> >>>>>> then consist of the slim distribution and whatever we deem > >> important > >> >>>> for > >> >>>>>> new users to get started. > >> >>>>>> > >> >>>>>> Cheers, > >> >>>>>> Till > >> >>>>>> > >> >>>>>> On Thu, Apr 16, 2020 at 11:33 AM Dawid Wysakowicz < > >> >>>>> dwysakow...@apache.org> <dwysakow...@apache.org> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>> > >> >>>>>> Hi all, > >> >>>>>> > >> >>>>>> Few points from my side: > >> >>>>>> > >> >>>>>> 1. I like the idea of simplifying the experience for first time > >> users. > >> >>>>>> As for production use cases I share Jark's opinion that in this > >> case I > >> >>>>>> would expect users to combine their distribution manually. I > think > >> in > >> >>>>>> such scenarios it is important to understand interconnections. > >> >>>>>> Personally I'd expect the slimmest possible distribution that I > can > >> >>>>>> extend further with what I need in my production scenario. > >> >>>>>> > >> >>>>>> 2. I think there is also the problem that the matrix of possible > >> >>>>>> combinations that can be useful is already big. Do we want to > have > >> a > >> >>>>>> distribution for: > >> >>>>>> > >> >>>>>> SQL users: which connectors should we include? should we > >> include > >> >>>>>> hive? which other catalog? > >> >>>>>> > >> >>>>>> DataStream users: which connectors should we include? > >> >>>>>> > >> >>>>>> For both of the above should we include yarn/kubernetes? > >> >>>>>> > >> >>>>>> I would opt for providing only the "slim" distribution as a > release > >> >>>>>> artifact. > >> >>>>>> > >> >>>>>> 3. However, as I said I think its worth investigating how we can > >> >>>> improve > >> >>>>>> users experience. What do you think of providing a tool, could be > >> e.g. > >> >>>> a > >> >>>>>> shell script that constructs a distribution based on users > choice. > >> I > >> >>>>>> think that was also what Chesnay mentioned as "tooling to > >> >>>>>> assemble custom distributions" In the end how I see the > difference > >> >>>>>> between a slim and fat distribution is which jars do we put into > >> the > >> >>>>>> lib, right? It could have a few "screens". > >> >>>>>> > >> >>>>>> 1. Which API are you interested in: > >> >>>>>> a. SQL API > >> >>>>>> b. DataStream API > >> >>>>>> > >> >>>>>> > >> >>>>>> 2. [SQL] Which connectors do you want to use? [multichoice]: > >> >>>>>> a. Kafka > >> >>>>>> b. Elasticsearch > >> >>>>>> ... > >> >>>>>> > >> >>>>>> 3. [SQL] Which catalog you want to use? > >> >>>>>> > >> >>>>>> ... > >> >>>>>> > >> >>>>>> Such a tool would download all the dependencies from maven and > put > >> >> them > >> >>>>>> into the correct folder. In the future we can extend it with > >> >> additional > >> >>>>>> rules e.g. kafka-0.9 cannot be chosen at the same time with > >> >>>>>> kafka-universal etc. > >> >>>>>> > >> >>>>>> The benefit of it would be that the distribution that we release > >> could > >> >>>>>> remain "slim" or we could even make it slimmer. I might be > missing > >> >>>>>> something here though. > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> > >> >>>>>> Dawdi > >> >>>>>> > >> >>>>>> On 16/04/2020 11:02, Aljoscha Krettek wrote: > >> >>>>>> > >> >>>>>> I want to reinforce my opinion from earlier: This is about > >> improving > >> >>>>>> the situation both for first-time users and for experienced users > >> that > >> >>>>>> want to use a Flink dist in production. The current Flink dist is > >> too > >> >>>>>> "thin" for first-time SQL users and it is too "fat" for > production > >> >>>>>> users, that is where serving no-one properly with the current > >> >>>>>> middle-ground. That's why I think introducing those specialized > >> >>>>>> "spins" of Flink dist would be good. > >> >>>>>> > >> >>>>>> By the way, at some point in the future production users might > not > >> >>>>>> even need to get a Flink dist anymore. They should be able to > have > >> >>>>>> Flink as a dependency of their project (including the runtime) > and > >> >>>>>> then build an image from this for Kubernetes or a fat jar for > YARN. > >> >>>>>> > >> >>>>>> Aljoscha > >> >>>>>> > >> >>>>>> On 15.04.20 18:14, wenlong.lwl wrote: > >> >>>>>> > >> >>>>>> Hi all, > >> >>>>>> > >> >>>>>> Regarding slim and fat distributions, I think different kinds of > >> jobs > >> >>>>>> may > >> >>>>>> prefer different type of distribution: > >> >>>>>> > >> >>>>>> For DataStream job, I think we may not like fat distribution > >> >>>>>> > >> >>>>>> containing > >> >>>>>> > >> >>>>>> connectors because user would always need to depend on the > >> connector > >> >>>>>> > >> >>>>>> in > >> >>>>>> > >> >>>>>> user code, it is easy to include the connector jar in the user > lib. > >> >>>>>> > >> >>>>>> Less > >> >>>>>> > >> >>>>>> jar in lib means less class conflicts and problems. > >> >>>>>> > >> >>>>>> For SQL job, I think we are trying to encourage user to user pure > >> >>>>>> sql(DDL + > >> >>>>>> DML) to construct their job, In order to improve user experience, > >> It > >> >>>>>> may be > >> >>>>>> important for flink, not only providing as many connector jar in > >> >>>>>> distribution as possible especially the connector and format we > >> have > >> >>>>>> well > >> >>>>>> documented, but also providing an mechanism to load connectors > >> >>>>>> according > >> >>>>>> to the DDLs, > >> >>>>>> > >> >>>>>> So I think it could be good to place connector/format jars in > some > >> >>>>>> dir like > >> >>>>>> opt/connector which would not affect jobs by default, and > >> introduce a > >> >>>>>> mechanism of dynamic discovery for SQL. > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> Wenlong > >> >>>>>> > >> >>>>>> On Wed, 15 Apr 2020 at 22:46, Jingsong Li < > jingsongl...@gmail.com> > >> < > >> >>>>> jingsongl...@gmail.com> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>> > >> >>>>>> Hi, > >> >>>>>> > >> >>>>>> I am thinking both "improve first experience" and "improve > >> production > >> >>>>>> experience". > >> >>>>>> > >> >>>>>> I'm thinking about what's the common mode of Flink? > >> >>>>>> Streaming job use Kafka? Batch job use Hive? > >> >>>>>> > >> >>>>>> Hive 1.2.1 dependencies can be compatible with most of Hive > server > >> >>>>>> versions. So Spark and Presto have built-in Hive 1.2.1 > dependency. > >> >>>>>> Flink is currently mainly used for streaming, so let's not talk > >> >>>>>> about hive. > >> >>>>>> > >> >>>>>> For streaming jobs, first of all, the jobs in my mind is (related > >> to > >> >>>>>> connectors): > >> >>>>>> - ETL jobs: Kafka -> Kafka > >> >>>>>> - Join jobs: Kafka -> DimJDBC -> Kafka > >> >>>>>> - Aggregation jobs: Kafka -> JDBCSink > >> >>>>>> So Kafka and JDBC are probably the most commonly used. Of course, > >> >>>>>> > >> >>>>>> also > >> >>>>>> > >> >>>>>> includes CSV, JSON's formats. > >> >>>>>> So when we provide such a fat distribution: > >> >>>>>> - With CSV, JSON. > >> >>>>>> - With flink-kafka-universal and kafka dependencies. > >> >>>>>> - With flink-jdbc. > >> >>>>>> Using this fat distribution, most users can run their jobs well. > >> >>>>>> > >> >>>>>> (jdbc > >> >>>>>> > >> >>>>>> driver jar required, but this is very natural to do) > >> >>>>>> Can these dependencies lead to kinds of conflicts? Only Kafka may > >> >>>>>> > >> >>>>>> have > >> >>>>>> > >> >>>>>> conflicts, but if our goal is to use kafka-universal to support > all > >> >>>>>> Kafka > >> >>>>>> versions, it is hopeful to target the vast majority of users. > >> >>>>>> > >> >>>>>> We don't want to plug all jars into the fat distribution. Only > need > >> >>>>>> less > >> >>>>>> conflict and common. of course, it is a matter of consideration > to > >> >>>>>> > >> >>>>>> put > >> >>>>>> > >> >>>>>> which jar into fat distribution. > >> >>>>>> We have the opportunity to facilitate the majority of users, but > >> >>>>>> also left > >> >>>>>> opportunities for customization. > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> Jingsong Lee > >> >>>>>> > >> >>>>>> On Wed, Apr 15, 2020 at 10:09 PM Jark Wu <imj...@gmail.com> < > >> >>>>> imj...@gmail.com> wrote: > >> >>>>>> > >> >>>>>> Hi, > >> >>>>>> > >> >>>>>> I think we should first reach an consensus on "what problem do we > >> >>>>>> want to > >> >>>>>> solve?" > >> >>>>>> (1) improve first experience? or (2) improve production > experience? > >> >>>>>> > >> >>>>>> As far as I can see, with the above discussion, I think what we > >> >>>>>> want to > >> >>>>>> solve is the "first experience". > >> >>>>>> And I think the slim jar is still the best distribution for > >> >>>>>> production, > >> >>>>>> because it's easier to assembling jars > >> >>>>>> than excluding jars and can avoid potential class conflicts. > >> >>>>>> > >> >>>>>> If we want to improve "first experience", I think it make sense > to > >> >>>>>> have a > >> >>>>>> fat distribution to give users a more smooth first experience. > >> >>>>>> But I would like to call it "playground distribution" or > something > >> >>>>>> like > >> >>>>>> that to explicitly differ from the "slim production-purpose > >> >>>>>> > >> >>>>>> distribution". > >> >>>>>> > >> >>>>>> The "playground distribution" can contains some widely used jars, > >> >>>>>> > >> >>>>>> like > >> >>>>>> > >> >>>>>> universal-kafka-sql-connector, elasticsearch7-sql-connector, > avro, > >> >>>>>> json, > >> >>>>>> csv, etc.. > >> >>>>>> Even we can provide a playground docker which may contain the fat > >> >>>>>> distribution, python3, and hive. > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> Jark > >> >>>>>> > >> >>>>>> > >> >>>>>> On Wed, 15 Apr 2020 at 21:47, Chesnay Schepler < > ches...@apache.org> > >> < > >> >>>>> ches...@apache.org> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>> I don't see a lot of value in having multiple distributions. > >> >>>>>> > >> >>>>>> The simple reality is that no fat distribution we could provide > >> >>>>>> > >> >>>>>> would > >> >>>>>> > >> >>>>>> satisfy all use-cases, so why even try. > >> >>>>>> If users commonly run into issues for certain jars, then maybe > >> >>>>>> > >> >>>>>> those > >> >>>>>> > >> >>>>>> should be added to the current distribution. > >> >>>>>> > >> >>>>>> Personally though I still believe we should only distribute a > slim > >> >>>>>> version. I'd rather have users always add required jars to the > >> >>>>>> distribution than only when they go outside our "expected" > >> >>>>>> > >> >>>>>> use-cases. > >> >>>>>> > >> >>>>>> Then we might finally address this issue properly, i.e., tooling > to > >> >>>>>> assemble custom distributions and/or better error messages if > >> >>>>>> Flink-provided extensions cannot be found. > >> >>>>>> > >> >>>>>> On 15/04/2020 15:23, Kurt Young wrote: > >> >>>>>> > >> >>>>>> Regarding to the specific solution, I'm not sure about the "fat" > >> >>>>>> > >> >>>>>> and > >> >>>>>> > >> >>>>>> "slim" > >> >>>>>> > >> >>>>>> solution though. I get the idea > >> >>>>>> that we can make the slim one even more lightweight than current > >> >>>>>> distribution, but what about the "fat" > >> >>>>>> one? Do you mean that we would package all connectors and formats > >> >>>>>> > >> >>>>>> into > >> >>>>>> > >> >>>>>> this? I'm not sure if this is > >> >>>>>> feasible. For example, we can't put all versions of kafka and > hive > >> >>>>>> connector jars into lib directory, and > >> >>>>>> we also might need hadoop jars when using filesystem connector to > >> >>>>>> > >> >>>>>> access > >> >>>>>> > >> >>>>>> data from HDFS. > >> >>>>>> > >> >>>>>> So my guess would be we might hand-pick some of the most > >> >>>>>> > >> >>>>>> frequently > >> >>>>>> > >> >>>>>> used > >> >>>>>> > >> >>>>>> connectors and formats > >> >>>>>> into our "lib" directory, like kafka, csv, json metioned above, > >> >>>>>> > >> >>>>>> and > >> >>>>>> > >> >>>>>> still > >> >>>>>> > >> >>>>>> leave some other connectors out of it. > >> >>>>>> If this is the case, then why not we just provide this > >> >>>>>> > >> >>>>>> distribution > >> >>>>>> > >> >>>>>> to > >> >>>>>> > >> >>>>>> user? I'm not sure i get the benefit of > >> >>>>>> providing another super "slim" jar (we have to pay some costs to > >> >>>>>> > >> >>>>>> provide > >> >>>>>> > >> >>>>>> another suit of distribution). > >> >>>>>> > >> >>>>>> What do you think? > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> Kurt > >> >>>>>> > >> >>>>>> > >> >>>>>> On Wed, Apr 15, 2020 at 7:08 PM Jingsong Li < > >> >>>>>> > >> >>>>>> jingsongl...@gmail.com > >> >>>>>> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>> Big +1. > >> >>>>>> > >> >>>>>> I like "fat" and "slim". > >> >>>>>> > >> >>>>>> For csv and json, like Jark said, they are quite small and don't > >> >>>>>> > >> >>>>>> have > >> >>>>>> > >> >>>>>> other > >> >>>>>> > >> >>>>>> dependencies. They are important to kafka connector, and > >> >>>>>> > >> >>>>>> important > >> >>>>>> > >> >>>>>> to upcoming file system connector too. > >> >>>>>> So can we move them to both "fat" and "slim"? They're so > >> >>>>>> > >> >>>>>> important, > >> >>>>>> > >> >>>>>> and > >> >>>>>> > >> >>>>>> they're so lightweight. > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> Jingsong Lee > >> >>>>>> > >> >>>>>> On Wed, Apr 15, 2020 at 4:53 PM godfrey he <godfre...@gmail.com> > < > >> >>>>> godfre...@gmail.com> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>> Big +1. > >> >>>>>> This will improve user experience (special for Flink new users). > >> >>>>>> We answered so many questions about "class not found". > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> Godfrey > >> >>>>>> > >> >>>>>> Dian Fu <dian0511...@gmail.com> <dian0511...@gmail.com> > >> 于2020年4月15日周三 > >> >>>>> 下午4:30写道: > >> >>>>>> > >> >>>>>> +1 to this proposal. > >> >>>>>> > >> >>>>>> Missing connector jars is also a big problem for PyFlink users. > >> >>>>>> > >> >>>>>> Currently, > >> >>>>>> > >> >>>>>> after a Python user has installed PyFlink using `pip`, he has > >> >>>>>> > >> >>>>>> to > >> >>>>>> > >> >>>>>> manually > >> >>>>>> > >> >>>>>> copy the connector fat jars to the PyFlink installation > >> >>>>>> > >> >>>>>> directory > >> >>>>>> > >> >>>>>> for > >> >>>>>> > >> >>>>>> the > >> >>>>>> > >> >>>>>> connectors to be used if he wants to run jobs locally. This > >> >>>>>> > >> >>>>>> process > >> >>>>>> > >> >>>>>> is > >> >>>>>> > >> >>>>>> very > >> >>>>>> > >> >>>>>> confuse for users and affects the experience a lot. > >> >>>>>> > >> >>>>>> Regards, > >> >>>>>> Dian > >> >>>>>> > >> >>>>>> > >> >>>>>> 在 2020年4月15日,下午3:51,Jark Wu <imj...@gmail.com> <imj...@gmail.com > > > >> 写道: > >> >>>>>> > >> >>>>>> +1 to the proposal. I also found the "download additional jar" > >> >>>>>> > >> >>>>>> step > >> >>>>>> > >> >>>>>> is > >> >>>>>> > >> >>>>>> really verbose when I prepare webinars. > >> >>>>>> > >> >>>>>> At least, I think the flink-csv and flink-json should in the > >> >>>>>> > >> >>>>>> distribution, > >> >>>>>> > >> >>>>>> they are quite small and don't have other dependencies. > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> Jark > >> >>>>>> > >> >>>>>> On Wed, 15 Apr 2020 at 15:44, Jeff Zhang <zjf...@gmail.com> < > >> >>>>> zjf...@gmail.com> > >> >>>>>> wrote: > >> >>>>>> > >> >>>>>> Hi Aljoscha, > >> >>>>>> > >> >>>>>> Big +1 for the fat flink distribution, where do you plan to > >> >>>>>> > >> >>>>>> put > >> >>>>>> > >> >>>>>> these > >> >>>>>> > >> >>>>>> connectors ? opt or lib ? > >> >>>>>> > >> >>>>>> Aljoscha Krettek <aljos...@apache.org> <aljos...@apache.org> > >> >>>>> 于2020年4月15日周三 > >> >>>>>> 下午3:30写道: > >> >>>>>> > >> >>>>>> > >> >>>>>> Hi Everyone, > >> >>>>>> > >> >>>>>> I'd like to discuss about releasing a more full-featured > >> >>>>>> > >> >>>>>> Flink > >> >>>>>> > >> >>>>>> distribution. The motivation is that there is friction for > >> >>>>>> > >> >>>>>> SQL/Table > >> >>>>>> > >> >>>>>> API > >> >>>>>> > >> >>>>>> users that want to use Table connectors which are not there > >> >>>>>> > >> >>>>>> in > >> >>>>>> > >> >>>>>> the > >> >>>>>> > >> >>>>>> current Flink Distribution. For these users the workflow is > >> >>>>>> > >> >>>>>> currently > >> >>>>>> > >> >>>>>> roughly: > >> >>>>>> > >> >>>>>> - download Flink dist > >> >>>>>> - configure csv/Kafka/json connectors per configuration > >> >>>>>> - run SQL client or program > >> >>>>>> - decrypt error message and research the solution > >> >>>>>> - download additional connector jars > >> >>>>>> - program works correctly > >> >>>>>> > >> >>>>>> I realize that this can be made to work but if every SQL > >> >>>>>> > >> >>>>>> user > >> >>>>>> > >> >>>>>> has > >> >>>>>> > >> >>>>>> this > >> >>>>>> > >> >>>>>> as their first experience that doesn't seem good to me. > >> >>>>>> > >> >>>>>> My proposal is to provide two versions of the Flink > >> >>>>>> > >> >>>>>> Distribution > >> >>>>>> > >> >>>>>> in > >> >>>>>> > >> >>>>>> the > >> >>>>>> > >> >>>>>> future: "fat" and "slim" (names to be discussed): > >> >>>>>> > >> >>>>>> - slim would be even trimmer than todays distribution > >> >>>>>> - fat would contain a lot of convenience connectors (yet > >> >>>>>> > >> >>>>>> to > >> >>>>>> > >> >>>>>> be > >> >>>>>> > >> >>>>>> determined which one) > >> >>>>>> > >> >>>>>> And yes, I realize that there are already more dimensions of > >> >>>>>> > >> >>>>>> Flink > >> >>>>>> > >> >>>>>> releases (Scala version and Java version). > >> >>>>>> > >> >>>>>> For background, our current Flink dist has these in the opt > >> >>>>>> > >> >>>>>> directory: > >> >>>>>> > >> >>>>>> - flink-azure-fs-hadoop-1.10.0.jar > >> >>>>>> - flink-cep-scala_2.12-1.10.0.jar > >> >>>>>> - flink-cep_2.12-1.10.0.jar > >> >>>>>> - flink-gelly-scala_2.12-1.10.0.jar > >> >>>>>> - flink-gelly_2.12-1.10.0.jar > >> >>>>>> - flink-metrics-datadog-1.10.0.jar > >> >>>>>> - flink-metrics-graphite-1.10.0.jar > >> >>>>>> - flink-metrics-influxdb-1.10.0.jar > >> >>>>>> - flink-metrics-prometheus-1.10.0.jar > >> >>>>>> - flink-metrics-slf4j-1.10.0.jar > >> >>>>>> - flink-metrics-statsd-1.10.0.jar > >> >>>>>> - flink-oss-fs-hadoop-1.10.0.jar > >> >>>>>> - flink-python_2.12-1.10.0.jar > >> >>>>>> - flink-queryable-state-runtime_2.12-1.10.0.jar > >> >>>>>> - flink-s3-fs-hadoop-1.10.0.jar > >> >>>>>> - flink-s3-fs-presto-1.10.0.jar > >> >>>>>> - > >> >>>>>> > >> >>>>>> flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar > >> >>>>>> > >> >>>>>> - flink-sql-client_2.12-1.10.0.jar > >> >>>>>> - flink-state-processor-api_2.12-1.10.0.jar > >> >>>>>> - flink-swift-fs-hadoop-1.10.0.jar > >> >>>>>> > >> >>>>>> Current Flink dist is 267M. If we removed everything from > >> >>>>>> > >> >>>>>> opt > >> >>>>>> > >> >>>>>> we > >> >>>>>> > >> >>>>>> would > >> >>>>>> > >> >>>>>> go down to 126M. I would reccomend this, because the large > >> >>>>>> > >> >>>>>> majority > >> >>>>>> > >> >>>>>> of > >> >>>>>> > >> >>>>>> the files in opt are probably unused. > >> >>>>>> > >> >>>>>> What do you think? > >> >>>>>> > >> >>>>>> Best, > >> >>>>>> Aljoscha > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> -- > >> >>>>>> Best Regards > >> >>>>>> > >> >>>>>> Jeff Zhang > >> >>>>>> > >> >>>>>> > >> >>>>>> -- > >> >>>>>> Best, Jingsong Lee > >> >>>>>> > >> >>>>>> > >> >>>>>> -- > >> >>>>>> Best, Jingsong Lee > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >> > >> > >> > > > > -- > > Best, Jingsong Lee > > > > > -- > Best, Jingsong Lee >