Re: Python IO Connector
Hi Peter. Why don't you use this external library? https://pypi.org/project/beam-nuggets/ They already use SQLAlchemy and is pretty easy to use. On Mon, Jan 6, 2020 at 10:17 PM Luke Cwik wrote: > Eugene, the JdbcIO output should be updated to support Beam's schema > format which would allow for "rows" to cross the language boundaries. > > If the connector is easy to write and maintain then it makes sense for > native. Maybe the Python version will have an easier time to support > splitting and hence could overtake the Java implementation in useful > features. > > On Mon, Jan 6, 2020 at 3:55 PM wrote: > >> Apache Airflow went for the DB API approach as well and it seems like to >> have worked well for them. We will likely need to add extra_requires for >> each database engine Python package though, which adds some complexity but >> not a lot >> >> On Jan 6, 2020, at 6:12 PM, Eugene Kirpichov wrote: >> >> Agreed with above, it seems prudent to develop a pure-Python connector >> for something as common as interacting with a database. It's likely easier >> to achieve an idiomatic API, familiar to non-Beam Python SQL users, within >> pure Python. >> >> Developing a cross-language connector here might be plain impossible, >> because rows read from a database are (at least in JDBC) not encodable - >> they require a user's callback to translate to an encodable user type, and >> the callback can't be in Python because then you have to encode its input >> before giving it to Python. Same holds for the write transform. >> >> Not sure about sqlalchemy though, maybe use plain DB-API >> https://www.python.org/dev/peps/pep-0249/ instead? Seems like the Python >> one is more friendly than JDBC in the sense that it actually returns rows >> as tuples of simple data types. >> >> On Mon, Jan 6, 2020 at 1:42 PM Robert Bradshaw >> wrote: >> >>> On Mon, Jan 6, 2020 at 1:39 PM Chamikara Jayalath >>> wrote: >>> >>>> Regarding cross-language transforms, we need to add better >>>> documentation, but for now you'll have to go with existing examples and >>>> tests. For example, >>>> >>>> >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/gcp/pubsub.py >>>> >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py >>>> >>>> Note that cross-language transforms feature is currently only available >>>> for Flink Runner. Dataflow support is in development. >>>> >>> >>> I think it works with all non-Dataflow runners, with the exception of >>> the Java and Go Direct runners. (It does work with the Python direct >>> runner.) >>> >>> >>>> I'm fine with developing this natively for Python as well. AFAIK Java >>>> JDBC IO connector is not a super-complicated connector and it should be >>>> fine to make relatively easy to maintain and widely usable connectors >>>> available in multiple SDKs. >>>> >>> >>> Yes, a case can certainly be made for having native connectors for >>> particular common/simple sources. (We certainly don't call cross-language >>> to read text files for example.) >>> >>> >>>> >>>> Thanks, >>>> Cham >>>> >>>> >>>> On Mon, Jan 6, 2020 at 10:56 AM Luke Cwik wrote: >>>> >>>>> +Chamikara Jayalath +Heejong Lee >>>>> >>>>> >>>>> On Mon, Jan 6, 2020 at 10:20 AM wrote: >>>>> >>>>>> How do I go about doing that? From the docs, it appears cross >>>>>> language transforms are >>>>>> currently undocumented. >>>>>> https://beam.apache.org/roadmap/connectors-multi-sdk/ >>>>>> On Jan 6, 2020, at 12:55 PM, Luke Cwik wrote: >>>>>> >>>>>> What about using a cross language transform between Python and the >>>>>> already existing Java JdbcIO transform? >>>>>> >>>>>> On Sun, Jan 5, 2020 at 5:18 AM Peter Dannemann >>>>>> wrote: >>>>>> >>>>>>> I’d like to develop the Python SDK’s SQL IO connector. I was >>>>>>> thinking it would be easiest to use sqlalchemy to achieve maximum >>>>>>> database >>>>>>> engine support, but I suppose I could also create an ABC for databases >>>>>>> that >>>>>>> follow the DB API and create subclasses for each database engine that >>>>>>> override a connect method. What are your thoughts on the best way to do >>>>>>> this? >>>>>>> >>>>>> -- Lucas Magalhães, CTO Paralelo CS - Consultoria e Serviços Tel: +55 (11) 3090-5557 Cel: +55 (11) 99420-4667 lucas.magalh...@paralelocs.com.br <http://www.inteligenciaemnegocios.com.br>www.paralelocs.com.br
Re: Reading from RDB, ParDo or BoundedSource
Hi Pablo. Thanks for that.. That is exactly what i needed and it is much more simple than I thought hehe Em sáb, 28 de set de 2019 00:31, Pablo Estrada escreveu: > Hi Lucas! > That makes sense. I saw a question for this on StackOverflow recently. > Perhaps that was you? [1] - perhaps not, but then you're not the only one > trying to do this. > > I do not know a lot about connecting to RDBs from Python - it seemed to me > that you'd need to also install ODBC / JDBC drivers, and that's not that > easy to do on Dataflow. - So you would need to code a special transform > depending on the database you're reading from. > > As far as I know, Postgres also does not have an easy way to read data in > multiple threads in parallel, so consuming the results of your query would > be done in a single thread, so you can do it with a relatively simple DoFn. > Check my answer to the question [2], which has a DoFn for reading from > Postgres and one for MySQL. > > LMK if that helps! > > [1] > https://stackoverflow.com/questions/46528343/how-to-use-gcp-cloud-sql-as-dataflow-source-and-or-sink-with-python/58106722#58106722 > [2] https://stackoverflow.com/a/58106722/1255356 > > On Fri, Sep 27, 2019 at 4:43 PM Eugene Kirpichov > wrote: > >> I'm actually very surprised why to this day nobody wrote a Python >> connector for the Python Database API, like JdbcIO. >> Do we maybe have a way to use JdbcIO from Python via the cross-language >> connectors stuff? >> >> On Fri, Sep 27, 2019 at 4:28 PM Lucas Magalhães < >> lucas.magalh...@paralelocs.com.br> wrote: >> >>> Hi guys. >>> >>> Sorry. I forgot to mention that.. I'm using python SDK.. Its seems that >>> Java SDK looks like more mature, but i have no skill on that language. >>> >>> I'm trying to extract data from postgres (Cloud SQL), make some >>> agregations and save into BigQuery. >>> >>> Em sex, 27 de set de 2019 19:21, Pablo Estrada >>> escreveu: >>> >>>> Hi Lucas! >>>> Can you share more information about your use case? Java has JdbcIO. >>>> Maybe that's all you need? Or perhaps you're using Python SDK? >>>> Best >>>> -P. >>>> >>>> On Fri, Sep 27, 2019 at 3:08 PM Eugene Kirpichov >>>> wrote: >>>> >>>>> Hi Lucas, >>>>> Any reason why you can't use JdbcIO? >>>>> You almost certainly should *not* use BoundedSource, nor Splittable >>>>> DoFn for this. BoundedSource is obsolete in favor of assembling your >>>>> connector from regular transforms and/or using an SDF, and SDF is an >>>>> extremely advanced feature whose primary audience is Beam SDK authors. >>>>> >>>>> On Fri, Sep 27, 2019 at 2:52 PM Lucas Magalhães < >>>>> lucas.magalh...@paralelocs.com.br> wrote: >>>>> >>>>>> Hi guys. >>>>>> >>>>>> I'm new on apache Beam and o would like some help to undestand some >>>>>> behaviours. >>>>>> >>>>>> 1. Is there some performance issue when i'm reading data from a >>>>>> relational database using a ParDo instead of BoundedSource? >>>>>> >>>>>> 2. If I'm going to implement a BoundedSource how does Beam manage >>>>>> the connection? I need to open and close in every method, like split, >>>>>> read, >>>>>> estimete size and so on?? >>>>>> >>>>>> 3. I read something about splittable dofn but i didnt fine >>>>>> instructions about to How implement. Has anyone have something about ir? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>>>
Re: Reading from RDB, ParDo or BoundedSource
Hi guys. Sorry. I forgot to mention that.. I'm using python SDK.. Its seems that Java SDK looks like more mature, but i have no skill on that language. I'm trying to extract data from postgres (Cloud SQL), make some agregations and save into BigQuery. Em sex, 27 de set de 2019 19:21, Pablo Estrada escreveu: > Hi Lucas! > Can you share more information about your use case? Java has JdbcIO. Maybe > that's all you need? Or perhaps you're using Python SDK? > Best > -P. > > On Fri, Sep 27, 2019 at 3:08 PM Eugene Kirpichov > wrote: > >> Hi Lucas, >> Any reason why you can't use JdbcIO? >> You almost certainly should *not* use BoundedSource, nor Splittable DoFn >> for this. BoundedSource is obsolete in favor of assembling your connector >> from regular transforms and/or using an SDF, and SDF is an extremely >> advanced feature whose primary audience is Beam SDK authors. >> >> On Fri, Sep 27, 2019 at 2:52 PM Lucas Magalhães < >> lucas.magalh...@paralelocs.com.br> wrote: >> >>> Hi guys. >>> >>> I'm new on apache Beam and o would like some help to undestand some >>> behaviours. >>> >>> 1. Is there some performance issue when i'm reading data from a >>> relational database using a ParDo instead of BoundedSource? >>> >>> 2. If I'm going to implement a BoundedSource how does Beam manage the >>> connection? I need to open and close in every method, like split, read, >>> estimete size and so on?? >>> >>> 3. I read something about splittable dofn but i didnt fine instructions >>> about to How implement. Has anyone have something about ir? >>> >>> Thanks >>> >>> >>> >>>
Reading from RDB, ParDo or BoundedSource
Hi guys. I'm new on apache Beam and o would like some help to undestand some behaviours. 1. Is there some performance issue when i'm reading data from a relational database using a ParDo instead of BoundedSource? 2. If I'm going to implement a BoundedSource how does Beam manage the connection? I need to open and close in every method, like split, read, estimete size and so on?? 3. I read something about splittable dofn but i didnt fine instructions about to How implement. Has anyone have something about ir? Thanks
Re: MQTT to Python SDK
Thanks Altay.. Do you know where I could find more about cross language transforms? Documentation and examples as well. thanks again On Mon, Sep 16, 2019 at 4:00 PM Ahmet Altay wrote: > A framework for python sdk to use a native unbounded connector does not > exist yet. You might be able to use the same connector from Java using > cross language transforms. > > /cc +Chamikara Jayalath > > On Mon, Sep 16, 2019 at 11:00 AM Lucas Magalhães < > lucas.magalh...@paralelocs.com.br> wrote: > >> Hello dears! >> >> I'm starding a new project here and the mainly source is a MQTT. >> >> I could´n find any documentantion about to How to develeop a unbounded >> connector. >> >> Could anyone send me some instructions or guide line? >> >> Thanks a lot >> >> -- >> Lucas Magalhães, >> CTO >> >> Paralelo CS - Consultoria e Serviços >> Tel: +55 (11) 3090-5557 <+55%2011%203090-5557> >> Cel: +55 (11) 99420-4667 <+55%2011%2099420-4667> >> lucas.magalh...@paralelocs.com.br >> >> <http://www.inteligenciaemnegocios.com.br>www.paralelocs.com.br >> > -- Lucas Magalhães, CTO Paralelo CS - Consultoria e Serviços Tel: +55 (11) 3090-5557 Cel: +55 (11) 99420-4667 lucas.magalh...@paralelocs.com.br <http://www.inteligenciaemnegocios.com.br>www.paralelocs.com.br
MQTT to Python SDK
Hello dears! I'm starding a new project here and the mainly source is a MQTT. I could´n find any documentantion about to How to develeop a unbounded connector. Could anyone send me some instructions or guide line? Thanks a lot -- Lucas Magalhães, CTO Paralelo CS - Consultoria e Serviços Tel: +55 (11) 3090-5557 Cel: +55 (11) 99420-4667 lucas.magalh...@paralelocs.com.br <http://www.inteligenciaemnegocios.com.br>www.paralelocs.com.br