Hi Peter. Why don't you use this external library? https://pypi.org/project/beam-nuggets/ They already use SQLAlchemy and is pretty easy to use.
On Mon, Jan 6, 2020 at 10:17 PM Luke Cwik <lc...@google.com> wrote: > Eugene, the JdbcIO output should be updated to support Beam's schema > format which would allow for "rows" to cross the language boundaries. > > If the connector is easy to write and maintain then it makes sense for > native. Maybe the Python version will have an easier time to support > splitting and hence could overtake the Java implementation in useful > features. > > On Mon, Jan 6, 2020 at 3:55 PM <pbd...@gmail.com> wrote: > >> Apache Airflow went for the DB API approach as well and it seems like to >> have worked well for them. We will likely need to add extra_requires for >> each database engine Python package though, which adds some complexity but >> not a lot >> >> On Jan 6, 2020, at 6:12 PM, Eugene Kirpichov <j...@google.com> wrote: >> >> Agreed with above, it seems prudent to develop a pure-Python connector >> for something as common as interacting with a database. It's likely easier >> to achieve an idiomatic API, familiar to non-Beam Python SQL users, within >> pure Python. >> >> Developing a cross-language connector here might be plain impossible, >> because rows read from a database are (at least in JDBC) not encodable - >> they require a user's callback to translate to an encodable user type, and >> the callback can't be in Python because then you have to encode its input >> before giving it to Python. Same holds for the write transform. >> >> Not sure about sqlalchemy though, maybe use plain DB-API >> https://www.python.org/dev/peps/pep-0249/ instead? Seems like the Python >> one is more friendly than JDBC in the sense that it actually returns rows >> as tuples of simple data types. >> >> On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath <chamik...@google.com> >>> wrote: >>> >>>> Regarding cross-language transforms, we need to add better >>>> documentation, but for now you'll have to go with existing examples and >>>> tests. For example, >>>> >>>> >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py >>>> >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py >>>> >>>> Note that cross-language transforms feature is currently only available >>>> for Flink Runner. Dataflow support is in development. >>>> >>> >>> I think it works with all non-Dataflow runners, with the exception of >>> the Java and Go Direct runners. (It does work with the Python direct >>> runner.) >>> >>> >>>> I'm fine with developing this natively for Python as well. AFAIK Java >>>> JDBC IO connector is not a super-complicated connector and it should be >>>> fine to make relatively easy to maintain and widely usable connectors >>>> available in multiple SDKs. >>>> >>> >>> Yes, a case can certainly be made for having native connectors for >>> particular common/simple sources. (We certainly don't call cross-language >>> to read text files for example.) >>> >>> >>>> >>>> Thanks, >>>> Cham >>>> >>>> >>>> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik <lc...@google.com> wrote: >>>> >>>>> +Chamikara Jayalath <chamik...@google.com> +Heejong Lee >>>>> <heej...@google.com> >>>>> >>>>> On Mon, Jan 6, 2020 at 10:20 AM <pbd...@gmail.com> wrote: >>>>> >>>>>> How do I go about doing that? From the docs, it appears cross >>>>>> language transforms are >>>>>> currently undocumented. >>>>>> https://beam.apache.org/roadmap/connectors-multi-sdk/ >>>>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik <lc...@google.com> wrote: >>>>>> >>>>>> What about using a cross language transform between Python and the >>>>>> already existing Java JdbcIO transform? >>>>>> >>>>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann <pbd...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I’d like to develop the Python SDK’s SQL IO connector. I was >>>>>>> thinking it would be easiest to use sqlalchemy to achieve maximum >>>>>>> database >>>>>>> engine support, but I suppose I could also create an ABC for databases >>>>>>> that >>>>>>> follow the DB API and create subclasses for each database engine that >>>>>>> override a connect method. What are your thoughts on the best way to do >>>>>>> this? >>>>>>> >>>>>> -- Lucas Magalhães, CTO Paralelo CS - Consultoria e Serviços Tel: +55 (11) 3090-5557 Cel: +55 (11) 99420-4667 lucas.magalh...@paralelocs.com.br <http://www.inteligenciaemnegocios.com.br>www.paralelocs.com.br