Re: SQL in Python SDK

Mingmin Xu Fri, 13 Apr 2018 10:54:27 -0700

With current implementation we're not able to extent it for Python as
Calcite has Jave API only. Another separated Python based SQL should be the
solution. Based on our practice, we write lots of UDF/UDAF and customized
TABLE to fit our own data source/storage. For the former it could be
possible with an adaptor like https://github.com/ninia/jep (just a rough
idea and not verified) , for the later I don't see an option so far.


On Fri, Apr 13, 2018 at 9:11 AM, Robert Bradshaw <rober...@google.com>
wrote:

> On Fri, Apr 13, 2018 at 8:16 AM Andrew Pilloud <apill...@google.com>
> wrote:
>
>> Hi Gabor,
>>
>> Are Python UDFs (User-defined functions) something that might work for
>> you? If all you really need to write in Python is your DoFn this is
>> probably your best option.
>>
>
> +1. Note that since Python has tuples as builtin objects, this is a bit
> easier than in Java.
>
>
>> It is still a bit of work but we support Java UDFs today, so all you
>> would need to do is write a Java wrapper to call your Python function.
>>
>
> This is easier said than done (in an efficient manner at least), and
> probably the most compelling reason to implement SQL in pure Python (though
> I'm not convince it outweighs the downsides).
>
>
>>
>> Andrew
>>
>>
>> On Fri, Apr 13, 2018, 7:58 AM Kenneth Knowles <k...@google.com> wrote:
>>
>>> The most recent work on cross-language pipeline authoring is the design
>>> brainstorming at https://s.apache.org/beam-mixed-language-pipelines so
>>> it is still in the preliminary stages. There's no basic mystery, but there
>>> are a lot of practical considerations about what is easy to run on a
>>> pipeline author's machine.
>>>
>>> Regarding Apache Calcite - it is a Java library. It doesn't really make
>>> sense to bind it to Python. Today we don't use most of its capabilities. We
>>> just use it as a parser mostly. It would be easy to find an existing parser
>>> in Python or write your own (with ply, the basics could be done within a
>>> day). But still I don't think it makes sense to reimplement and maintain
>>> the SQL-to-Beam translation in multiple languages.
>>>
>>> Kenn
>>>
>>> On Fri, Apr 13, 2018 at 2:43 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> If someone implemented it directly in Python then it would be supported
>>>> directly in Python. I don't know if anyone is actively working on that -
>>>> the current implementation uses Apache Calcite, and I don't know whether
>>>> they have a Python API.
>>>>
>>>> On Fri, Apr 13, 2018 at 9:40 AM Prabeesh K. <prabsma...@gmail.com>
>>>> wrote:
>>>>
>>>>> What about supporting SQL in Python SDK?
>>>>>
>>>>> On 13 April 2018 at 13:32, Reuven Lax <re...@google.com> wrote:
>>>>>
>>>>>> The portability work will allow the Python and Java SDKs to be used
>>>>>> in the same pipeline, though this work is not yet complete.
>>>>>>
>>>>>>
>>>>> This is would be an interesting feature.
>>>>>
>>>>> On Fri, Apr 13, 2018 at 9:15 AM Gabor Hermann <m...@gaborhermann.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey all,
>>>>>>>
>>>>>>> Are there any efforts towards supporting SQL from the Python SDK,
>>>>>>> not
>>>>>>> just from Java? I couldn't find any info about this in JIRA or
>>>>>>> mailing
>>>>>>> lists.
>>>>>>>
>>>>>>> How much effort do you think it would take to implement this? Are
>>>>>>> there
>>>>>>> some dependencies like supporting more features in Python? I know
>>>>>>> that
>>>>>>> the Python SDK is experimental.
>>>>>>>
>>>>>>> As an alternative, is there a way to combine Python and Java SDKs in
>>>>>>> the
>>>>>>> same pipeline?
>>>>>>>
>>>>>>> Thanks for your answers in advance!
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Gabor
>>>>>>>
>>>>>>>
>>>>>
>>>>>


-- 
----
Mingmin

Re: SQL in Python SDK

Reply via email to