Hi

I already saw a project with the same idea.
https://github.com/cloudera-labs/envelope

Regards,

On Wed, 14 Jun 2017 at 04:32, bo yang <bobyan...@gmail.com> wrote:

> Thanks Benjamin and Ayan for the feedback! You kind of represent two group
> of people who need such script tool or not. Personally I find the script is
> very useful for myself to write ETL pipelines and daily jobs. Let's see
> whether there are other people interested in such project.
>
> Best,
> Bo
>
>
>
>
>
> On Mon, Jun 12, 2017 at 11:26 PM, ayan guha <guha.a...@gmail.com> wrote:
>
>> Hi
>>
>> IMHO, this approach is not very useful.
>>
>> Firstly, 2 use cases mentioned in the project page:
>>
>> 1. Simplify spark development - I think the only thing can be done there
>> is to come up with some boilerplate function, which essentially will take a
>> sql and come back with a temp table name and a corresponding DF (Remember
>> the project targets structured data sources only, not streaming or RDD).
>> Building another mini-DSL on top of already fairly elaborate spark API
>> never appealed to me.
>>
>> 2. Business Analysts using Spark - single word answer is Notebooks. Take
>> your pick - Jupyter, Zeppelin, Hue.
>>
>> The case of "Spark is for Developers", IMHO, stemmed to the
>> packaging/building overhead of spark apps. For Python users, this barrier
>> is considerably lower (And maybe that is why I do not see a prominent
>> need).
>>
>> But I can imagine the pain of a SQL developer coming into a scala/java
>> world. I came from a hardcore SQL/DWH environment where I used to write SQL
>> and SQL only. So SBT or MVN are still not my friend. Maybe someday they
>> will. But learned them hard way, just because the value of using spark can
>> offset the pain long long way. So, I think there is a need of spending time
>> with the environment to get comfortable with it. And maybe, just maybe,
>> using Nifi in case you miss drag/drop features too much :)
>>
>> But, these are my 2c, and sincerely humble opinion, and I wish you all
>> the luck for your project.
>>
>> On Tue, Jun 13, 2017 at 3:23 PM, Benjamin Kim <bbuil...@gmail.com> wrote:
>>
>>> Hi Bo,
>>>
>>> +1 for your project. I come from the world of data warehouses, ETL, and
>>> reporting analytics. There are many individuals who do not know or want to
>>> do any coding. They are content with ANSI SQL and stick to it. ETL
>>> workflows are also done without any coding using a drag-and-drop user
>>> interface, such as Talend, SSIS, etc. There is a small amount of scripting
>>> involved but not too much. I looked at what you are trying to do, and I
>>> welcome it. This could open up Spark to the masses and shorten development
>>> times.
>>>
>>> Cheers,
>>> Ben
>>>
>>>
>>> On Jun 12, 2017, at 10:14 PM, bo yang <bobyan...@gmail.com> wrote:
>>>
>>> Hi Aakash,
>>>
>>> Thanks for your willing to help :) It will be great if I could get more
>>> feedback on my project. For example, is there any other people feeling the
>>> need of using a script to write Spark job easily? Also, I would explore
>>> whether it is possible that the Spark project takes some work to build such
>>> a script based high level DSL.
>>>
>>> Best,
>>> Bo
>>>
>>>
>>> On Mon, Jun 12, 2017 at 12:14 PM, Aakash Basu <
>>> aakash.spark....@gmail.com> wrote:
>>>
>>>> Hey,
>>>>
>>>> I work on Spark SQL and would pretty much be able to help you in this.
>>>> Let me know your requirement.
>>>>
>>>> Thanks,
>>>> Aakash.
>>>>
>>>> On 12-Jun-2017 11:00 AM, "bo yang" <bobyan...@gmail.com> wrote:
>>>>
>>>>> Hi Guys,
>>>>>
>>>>> I am writing a small open source project
>>>>> <https://github.com/uber/uberscriptquery> to use SQL Script to write
>>>>> Spark Jobs. Want to see if there are other people interested to use or
>>>>> contribute to this project.
>>>>>
>>>>> The project is called UberScriptQuery (
>>>>> https://github.com/uber/uberscriptquery). Sorry for the dumb name to
>>>>> avoid conflict with many other names (Spark is registered trademark, thus 
>>>>> I
>>>>> could not use Spark in my project name).
>>>>>
>>>>> In short, it is a high level SQL-like DSL (Domain Specific Language)
>>>>> on top of Spark. People can use that DSL to write Spark jobs without
>>>>> worrying about Spark internal details. Please check README
>>>>> <https://github.com/uber/uberscriptquery> in the project to get more
>>>>> details.
>>>>>
>>>>> It will be great if I could get any feedback or suggestions!
>>>>>
>>>>> Best,
>>>>> Bo
>>>>>
>>>>>
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
> --

M'BAREK Med Nihed,
Fedora Ambassador, TUNISIA, Northern Africa
http://www.nihed.com

<http://tn.linkedin.com/in/nihed>

Reply via email to