Hi I already saw a project with the same idea. https://github.com/cloudera-labs/envelope
Regards, On Wed, 14 Jun 2017 at 04:32, bo yang <bobyan...@gmail.com> wrote: > Thanks Benjamin and Ayan for the feedback! You kind of represent two group > of people who need such script tool or not. Personally I find the script is > very useful for myself to write ETL pipelines and daily jobs. Let's see > whether there are other people interested in such project. > > Best, > Bo > > > > > > On Mon, Jun 12, 2017 at 11:26 PM, ayan guha <guha.a...@gmail.com> wrote: > >> Hi >> >> IMHO, this approach is not very useful. >> >> Firstly, 2 use cases mentioned in the project page: >> >> 1. Simplify spark development - I think the only thing can be done there >> is to come up with some boilerplate function, which essentially will take a >> sql and come back with a temp table name and a corresponding DF (Remember >> the project targets structured data sources only, not streaming or RDD). >> Building another mini-DSL on top of already fairly elaborate spark API >> never appealed to me. >> >> 2. Business Analysts using Spark - single word answer is Notebooks. Take >> your pick - Jupyter, Zeppelin, Hue. >> >> The case of "Spark is for Developers", IMHO, stemmed to the >> packaging/building overhead of spark apps. For Python users, this barrier >> is considerably lower (And maybe that is why I do not see a prominent >> need). >> >> But I can imagine the pain of a SQL developer coming into a scala/java >> world. I came from a hardcore SQL/DWH environment where I used to write SQL >> and SQL only. So SBT or MVN are still not my friend. Maybe someday they >> will. But learned them hard way, just because the value of using spark can >> offset the pain long long way. So, I think there is a need of spending time >> with the environment to get comfortable with it. And maybe, just maybe, >> using Nifi in case you miss drag/drop features too much :) >> >> But, these are my 2c, and sincerely humble opinion, and I wish you all >> the luck for your project. >> >> On Tue, Jun 13, 2017 at 3:23 PM, Benjamin Kim <bbuil...@gmail.com> wrote: >> >>> Hi Bo, >>> >>> +1 for your project. I come from the world of data warehouses, ETL, and >>> reporting analytics. There are many individuals who do not know or want to >>> do any coding. They are content with ANSI SQL and stick to it. ETL >>> workflows are also done without any coding using a drag-and-drop user >>> interface, such as Talend, SSIS, etc. There is a small amount of scripting >>> involved but not too much. I looked at what you are trying to do, and I >>> welcome it. This could open up Spark to the masses and shorten development >>> times. >>> >>> Cheers, >>> Ben >>> >>> >>> On Jun 12, 2017, at 10:14 PM, bo yang <bobyan...@gmail.com> wrote: >>> >>> Hi Aakash, >>> >>> Thanks for your willing to help :) It will be great if I could get more >>> feedback on my project. For example, is there any other people feeling the >>> need of using a script to write Spark job easily? Also, I would explore >>> whether it is possible that the Spark project takes some work to build such >>> a script based high level DSL. >>> >>> Best, >>> Bo >>> >>> >>> On Mon, Jun 12, 2017 at 12:14 PM, Aakash Basu < >>> aakash.spark....@gmail.com> wrote: >>> >>>> Hey, >>>> >>>> I work on Spark SQL and would pretty much be able to help you in this. >>>> Let me know your requirement. >>>> >>>> Thanks, >>>> Aakash. >>>> >>>> On 12-Jun-2017 11:00 AM, "bo yang" <bobyan...@gmail.com> wrote: >>>> >>>>> Hi Guys, >>>>> >>>>> I am writing a small open source project >>>>> <https://github.com/uber/uberscriptquery> to use SQL Script to write >>>>> Spark Jobs. Want to see if there are other people interested to use or >>>>> contribute to this project. >>>>> >>>>> The project is called UberScriptQuery ( >>>>> https://github.com/uber/uberscriptquery). Sorry for the dumb name to >>>>> avoid conflict with many other names (Spark is registered trademark, thus >>>>> I >>>>> could not use Spark in my project name). >>>>> >>>>> In short, it is a high level SQL-like DSL (Domain Specific Language) >>>>> on top of Spark. People can use that DSL to write Spark jobs without >>>>> worrying about Spark internal details. Please check README >>>>> <https://github.com/uber/uberscriptquery> in the project to get more >>>>> details. >>>>> >>>>> It will be great if I could get any feedback or suggestions! >>>>> >>>>> Best, >>>>> Bo >>>>> >>>>> >>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > > -- M'BAREK Med Nihed, Fedora Ambassador, TUNISIA, Northern Africa http://www.nihed.com <http://tn.linkedin.com/in/nihed>