+1, It is always good to have new ways to ingest data as an Iceberg table.
- Ajantha On Wed, May 22, 2024 at 7:32 PM Jean-Baptiste Onofré <[email protected]> wrote: > Hi Omar, > > That's the plan (see the last section in my previous email). Just > wanted to bring some attention in the Iceberg community :) > > Regards > JB > > On Wed, May 22, 2024 at 10:01 AM Omar Al-Safi <[email protected]> wrote: > > > > IMO the Camel iceberg component should live in the camel repo. it can be > part of the camel components registry in camel > > > > On Wed, May 22, 2024 at 9:58 AM Jean-Baptiste Onofré <[email protected]> > wrote: > >> > >> Hi Manish > >> > >> No, Camel is not an alternative to Spark or Flink: Camel is not a > >> query engine. It's more a "complement" to Kafka Connect. > >> > >> Regards > >> JB > >> > >> On Wed, May 22, 2024 at 7:09 AM Manish Malhotra > >> <[email protected]> wrote: > >> > > >> > Is Camel can be used as an alternate to Flink? > >> > > >> > > >> > On Tue, May 21, 2024 at 10:17 AM Ryan Blue <[email protected]> wrote: > >> >> > >> >> This is an interesting idea. What is the use case and where should > this live? I'm unfamiliar with Camel and I'm not sure what the normal thing > is. At least in the Iceberg community, we generally avoid adding connectors > unless there is a clear use case and demand for them. We don't want to add > code that needs to be maintained but isn't used. > >> >> > >> >> On Tue, May 21, 2024 at 10:15 AM Yufei Gu <[email protected]> > wrote: > >> >>> > >> >>> Hi JB, > >> >>> > >> >>> Thanks for sharing. Got a few questions: > >> >>> > >> >>> Does Apache Camel rely on other engines, e.g., Spark or Flink for > any processing, or is it fully self-contained? > >> >>> What are the potential challenges or limitations you foresee? For > example, does it generate too many commits and/or small files considering > its use cases(IoT, Event streaming)? Can Camel cache ingestion data, and > write it to the Iceberg table as a batch? > >> >>> How do you recommend handling schema evolution in Iceberg tables > when integrating with Camel routes? > >> >>> > >> >>> Yufei > >> >>> > >> >>> > >> >>> On Tue, May 21, 2024 at 6:06 AM Jean-Baptiste Onofré < > [email protected]> wrote: > >> >>>> > >> >>>> Hi folks, > >> >>>> > >> >>>> I'm working on a Iceberg component for Apache Camel: > >> >>>> > https://github.com/jbonofre/iceberg/tree/CAMEL/camel/camel-iceberg/src/main > >> >>>> > >> >>>> Apache Camel is an integration framework, supporting a lot of > >> >>>> components and EIPs (Enterprise Integration Patterns, like Content > >> >>>> Based Router, Splitter, Aggregator, Content Enricher, ...). > >> >>>> Camel is very popular in a lot of use cases, like IoT, system > >> >>>> integration, event streamings, ... > >> >>>> > >> >>>> This component provides a Camel component with: > >> >>>> - a Camel consumer endpoint (from) to read data from Iceberg > >> >>>> tables/views (scan) and create a Camel exchange > >> >>>> - a Camel producer endpoint (to) to write data (from Camel > exchange) > >> >>>> to Iceberg tables/views > >> >>>> > >> >>>> For instance, you can write a Camel route like this (using the > >> >>>> spring/blueprint DSL for instance): > >> >>>> > >> >>>> <from uri="jms:queue:foo"/> > >> >>>> <process ref="#convertToIcebergRecords"/> <!-- optional depending > on > >> >>>> the exchange message body --> > >> >>>> <to uri="iceberg:my_table?catalog=#ref"/> > >> >>>> > >> >>>> This route is event driven, consuming messages from the foo JMS > queue > >> >>>> (from Apache ActiveMQ for instance), and writing a message body to > >> >>>> my_table iceberg table (it's possible to use a router or multicast > >> >>>> EIPs to send the exchange to different tables). > >> >>>> NB: for the from (consumer endpoint), you can use any Camel > component > >> >>>> (https://camel.apache.org/components/4.4.x/). > >> >>>> > >> >>>> You can also consume (scan) data from an Iceberg table, and send > the > >> >>>> generated Exchange to any endpoint/route: > >> >>>> > >> >>>> <from uri="iceberg:my_table?catalog=#ref"/> > >> >>>> <process ref="#convertFromIcebergRecords"/> <!-- optional > depending on > >> >>>> the next steps in the route --> > >> >>>> <wireTap uri="direct:tap"/> > >> >>>> <to > uri="mongodb:myDB?database=mydb&collection=foo&operation=insert"/> > >> >>>> > >> >>>> This route generates exchanges from my_table Iceberg table, uses > the > >> >>>> wiretap EIP and stores the data into a mongoDB database/collection. > >> >>>> > >> >>>> If I started the component in the Iceberg repo, I think it would > make > >> >>>> more sense to have it at camel (as Apache Beam contains the Iceberg > >> >>>> IO). > >> >>>> Thoughts ? > >> >>>> > >> >>>> Comments are welcome ! > >> >>>> > >> >>>> NB: on a related topic, I created > https://github.com/apache/iceberg/pull/10365 > >> >>>> > >> >>>> Regards > >> >>>> JB > >> >> > >> >> > >> >> > >> >> -- > >> >> Ryan Blue > >> >> Tabular >
