Hi Nihed, Interesting to see envelope. The idea is same there! Thanks for the sharing :)
Best, Bo On Wed, Jun 14, 2017 at 12:22 AM, nihed mbarek <nihe...@gmail.com> wrote: > Hi > > I already saw a project with the same idea. > https://github.com/cloudera-labs/envelope > > Regards, > > On Wed, 14 Jun 2017 at 04:32, bo yang <bobyan...@gmail.com> wrote: > >> Thanks Benjamin and Ayan for the feedback! You kind of represent two >> group of people who need such script tool or not. Personally I find the >> script is very useful for myself to write ETL pipelines and daily jobs. >> Let's see whether there are other people interested in such project. >> >> Best, >> Bo >> >> >> >> >> >> On Mon, Jun 12, 2017 at 11:26 PM, ayan guha <guha.a...@gmail.com> wrote: >> >>> Hi >>> >>> IMHO, this approach is not very useful. >>> >>> Firstly, 2 use cases mentioned in the project page: >>> >>> 1. Simplify spark development - I think the only thing can be done there >>> is to come up with some boilerplate function, which essentially will take a >>> sql and come back with a temp table name and a corresponding DF (Remember >>> the project targets structured data sources only, not streaming or RDD). >>> Building another mini-DSL on top of already fairly elaborate spark API >>> never appealed to me. >>> >>> 2. Business Analysts using Spark - single word answer is Notebooks. Take >>> your pick - Jupyter, Zeppelin, Hue. >>> >>> The case of "Spark is for Developers", IMHO, stemmed to the >>> packaging/building overhead of spark apps. For Python users, this barrier >>> is considerably lower (And maybe that is why I do not see a prominent >>> need). >>> >>> But I can imagine the pain of a SQL developer coming into a scala/java >>> world. I came from a hardcore SQL/DWH environment where I used to write SQL >>> and SQL only. So SBT or MVN are still not my friend. Maybe someday they >>> will. But learned them hard way, just because the value of using spark can >>> offset the pain long long way. So, I think there is a need of spending time >>> with the environment to get comfortable with it. And maybe, just maybe, >>> using Nifi in case you miss drag/drop features too much :) >>> >>> But, these are my 2c, and sincerely humble opinion, and I wish you all >>> the luck for your project. >>> >>> On Tue, Jun 13, 2017 at 3:23 PM, Benjamin Kim <bbuil...@gmail.com> >>> wrote: >>> >>>> Hi Bo, >>>> >>>> +1 for your project. I come from the world of data warehouses, ETL, and >>>> reporting analytics. There are many individuals who do not know or want to >>>> do any coding. They are content with ANSI SQL and stick to it. ETL >>>> workflows are also done without any coding using a drag-and-drop user >>>> interface, such as Talend, SSIS, etc. There is a small amount of scripting >>>> involved but not too much. I looked at what you are trying to do, and I >>>> welcome it. This could open up Spark to the masses and shorten development >>>> times. >>>> >>>> Cheers, >>>> Ben >>>> >>>> >>>> On Jun 12, 2017, at 10:14 PM, bo yang <bobyan...@gmail.com> wrote: >>>> >>>> Hi Aakash, >>>> >>>> Thanks for your willing to help :) It will be great if I could get more >>>> feedback on my project. For example, is there any other people feeling the >>>> need of using a script to write Spark job easily? Also, I would explore >>>> whether it is possible that the Spark project takes some work to build such >>>> a script based high level DSL. >>>> >>>> Best, >>>> Bo >>>> >>>> >>>> On Mon, Jun 12, 2017 at 12:14 PM, Aakash Basu < >>>> aakash.spark....@gmail.com> wrote: >>>> >>>>> Hey, >>>>> >>>>> I work on Spark SQL and would pretty much be able to help you in this. >>>>> Let me know your requirement. >>>>> >>>>> Thanks, >>>>> Aakash. >>>>> >>>>> On 12-Jun-2017 11:00 AM, "bo yang" <bobyan...@gmail.com> wrote: >>>>> >>>>>> Hi Guys, >>>>>> >>>>>> I am writing a small open source project >>>>>> <https://github.com/uber/uberscriptquery> to use SQL Script to write >>>>>> Spark Jobs. Want to see if there are other people interested to use or >>>>>> contribute to this project. >>>>>> >>>>>> The project is called UberScriptQuery (https://github.com/uber/ >>>>>> uberscriptquery). Sorry for the dumb name to avoid conflict with >>>>>> many other names (Spark is registered trademark, thus I could not use >>>>>> Spark >>>>>> in my project name). >>>>>> >>>>>> In short, it is a high level SQL-like DSL (Domain Specific Language) >>>>>> on top of Spark. People can use that DSL to write Spark jobs without >>>>>> worrying about Spark internal details. Please check README >>>>>> <https://github.com/uber/uberscriptquery> in the project to get more >>>>>> details. >>>>>> >>>>>> It will be great if I could get any feedback or suggestions! >>>>>> >>>>>> Best, >>>>>> Bo >>>>>> >>>>>> >>>> >>>> >>> >>> >>> -- >>> Best Regards, >>> Ayan Guha >>> >> >> -- > > M'BAREK Med Nihed, > Fedora Ambassador, TUNISIA, Northern Africa > http://www.nihed.com > > <http://tn.linkedin.com/in/nihed> > >