Re: a fix to send RabbitMq messages

2019-05-27 Thread Nicolas Delsaux

I've created https://issues.apache.org/jira/browse/BEAM-7433 to track
that bug.

Unfortunatly i have not moch detail to provide ... Sorry


On 2019/05/24 16:37:36, Kenneth Knowles  wrote:
> Coders are all set up by the SDK before the pipeline is given to a
runner,>
> so that sounds like a strange issue. Would you also file a Jira ticket>
> about your experience with the coder registry and the DataflowRunner?>
>
> On Fri, May 24, 2019 at 5:26 AM Nicolas Delsaux >
> wrote:>
>
> > Thanks, PR is started (https://github.com/apache/beam/pull/8677), and>
> > I've set both Alexey and you as potential reviewers.>
> >>
> > Le 24/05/2019 à 13:55, Jean-Baptiste Onofré a écrit :>
> > > Hi,>
> > >>
> > > You can create a PullRequest, I will do the review.>
> > >>
> > > The coder is set on the RabbitMQIO PTransform, so, it should work.>
> > >>
> > > AFAIR, we have a Jira about that and I already started to check.
Not yet>
> > > completed yet.>
> > >>
> > > Regards>
> > > JB>
> > >>
> > > On 24/05/2019 11:01, Nicolas Delsaux wrote:>
> > >> Hi all>
> > >>>
> > >> I'm currently evaluationg Apache Beam to transfer messages from
RabbitMq>
> > >> to kafka with some transform in between.>
> > >>>
> > >> Doing, so, i've discovered some differences between direct runner>
> > >> behaviour and Google Dataflow runner.>
> > >>>
> > >> But first, a small introduction to what I know.>
> > >>>
> > >> From what I understand, elements transmitted between two different>
> > >> transforms are serialized/deserialized.>
> > >>>
> > >> This (de)serialization process is normally managed by Coder, in
which>
> > >> the most used is obviously the Serializablecoder, which takes a>
> > >> serializable object and (de)serialize it using classical java>
> > mechanisms.>
> > >>>
> > >> On direct runner, i had issues with rabbitMq messages, as they
contain>
> > >> in their headers objects that are LongString, an interface
implemented>
> > >> solely in a private static class of RabbitMq, and used for large
text>
> > >> messages.>
> > >>>
> > >> So I wrote a RabbitMqMessageCoder, and installed it in my pipeline>
> > >> (using>
> > >>
pipeline.getCoderregistry().registerCoderForClass(RabbitMqMessage.class,>
> > >> new MyCoder())>
> > >>>
> > >> And it worked ! well, not in dataflow runner.>
> > >>>
> > >>>
> > >> indeed, it seems like dataflow runner don't use this coder
registry>
> > >> mechanism (for reasons I absolutely don't understand).>
> > >>>
> > >> So my fix didn't work.>
> > >>>
> > >> After various tries, I finally gave up and directly modified the>
> > >> RabbitMqIO class (from Apache Beam) to handle my case.>
> > >>>
> > >> This fix is available on my Beam fork on GitHub, and i would
like to>
> > >> have it integrated.>
> > >>>
> > >> What is the procedure to do so ?>
> > >>>
> > >> Thanks !>
> > >>>
> >>
>


Re: a fix to send RabbitMq messages

2019-05-24 Thread Kenneth Knowles
Coders are all set up by the SDK before the pipeline is given to a runner,
so that sounds like a strange issue. Would you also file a Jira ticket
about your experience with the coder registry and the DataflowRunner?

On Fri, May 24, 2019 at 5:26 AM Nicolas Delsaux 
wrote:

> Thanks, PR is started (https://github.com/apache/beam/pull/8677), and
> I've set both Alexey and you as potential reviewers.
>
> Le 24/05/2019 à 13:55, Jean-Baptiste Onofré a écrit :
> > Hi,
> >
> > You can create a PullRequest, I will do the review.
> >
> > The coder is set on the RabbitMQIO PTransform, so, it should work.
> >
> > AFAIR, we have a Jira about that and I already started to check. Not yet
> > completed yet.
> >
> > Regards
> > JB
> >
> > On 24/05/2019 11:01, Nicolas Delsaux wrote:
> >> Hi all
> >>
> >> I'm currently evaluationg Apache Beam to transfer messages from RabbitMq
> >> to kafka with some transform in between.
> >>
> >> Doing, so, i've discovered some differences between direct runner
> >> behaviour and Google Dataflow runner.
> >>
> >> But first, a small introduction to what I know.
> >>
> >>  From what I understand, elements transmitted between two different
> >> transforms are serialized/deserialized.
> >>
> >> This (de)serialization process is normally managed by Coder, in which
> >> the most used is obviously the Serializablecoder, which takes a
> >> serializable object and (de)serialize it using classical java
> mechanisms.
> >>
> >> On direct runner, i had issues with rabbitMq messages, as they contain
> >> in their headers objects that are LongString, an interface implemented
> >> solely in a private static class of RabbitMq, and used for large text
> >> messages.
> >>
> >> So I wrote a RabbitMqMessageCoder, and installed it in my pipeline
> >> (using
> >> pipeline.getCoderregistry().registerCoderForClass(RabbitMqMessage.class,
> >> new MyCoder())
> >>
> >> And it worked ! well, not in dataflow runner.
> >>
> >>
> >> indeed, it seems like dataflow runner don't use this coder registry
> >> mechanism (for reasons I absolutely don't understand).
> >>
> >> So my fix didn't work.
> >>
> >> After various tries, I finally gave up and directly modified the
> >> RabbitMqIO class (from Apache Beam) to handle my case.
> >>
> >> This fix is available on my Beam fork on GitHub, and i would like to
> >> have it integrated.
> >>
> >> What is the procedure to do so ?
> >>
> >> Thanks !
> >>
>


Re: a fix to send RabbitMq messages

2019-05-24 Thread Nicolas Delsaux

Thanks, PR is started (https://github.com/apache/beam/pull/8677), and
I've set both Alexey and you as potential reviewers.

Le 24/05/2019 à 13:55, Jean-Baptiste Onofré a écrit :

Hi,

You can create a PullRequest, I will do the review.

The coder is set on the RabbitMQIO PTransform, so, it should work.

AFAIR, we have a Jira about that and I already started to check. Not yet
completed yet.

Regards
JB

On 24/05/2019 11:01, Nicolas Delsaux wrote:

Hi all

I'm currently evaluationg Apache Beam to transfer messages from RabbitMq
to kafka with some transform in between.

Doing, so, i've discovered some differences between direct runner
behaviour and Google Dataflow runner.

But first, a small introduction to what I know.

 From what I understand, elements transmitted between two different
transforms are serialized/deserialized.

This (de)serialization process is normally managed by Coder, in which
the most used is obviously the Serializablecoder, which takes a
serializable object and (de)serialize it using classical java mechanisms.

On direct runner, i had issues with rabbitMq messages, as they contain
in their headers objects that are LongString, an interface implemented
solely in a private static class of RabbitMq, and used for large text
messages.

So I wrote a RabbitMqMessageCoder, and installed it in my pipeline
(using
pipeline.getCoderregistry().registerCoderForClass(RabbitMqMessage.class,
new MyCoder())

And it worked ! well, not in dataflow runner.


indeed, it seems like dataflow runner don't use this coder registry
mechanism (for reasons I absolutely don't understand).

So my fix didn't work.

After various tries, I finally gave up and directly modified the
RabbitMqIO class (from Apache Beam) to handle my case.

This fix is available on my Beam fork on GitHub, and i would like to
have it integrated.

What is the procedure to do so ?

Thanks !



Re: a fix to send RabbitMq messages

2019-05-24 Thread Jean-Baptiste Onofré
Hi,

You can create a PullRequest, I will do the review.

The coder is set on the RabbitMQIO PTransform, so, it should work.

AFAIR, we have a Jira about that and I already started to check. Not yet
completed yet.

Regards
JB

On 24/05/2019 11:01, Nicolas Delsaux wrote:
> Hi all
> 
> I'm currently evaluationg Apache Beam to transfer messages from RabbitMq
> to kafka with some transform in between.
> 
> Doing, so, i've discovered some differences between direct runner
> behaviour and Google Dataflow runner.
> 
> But first, a small introduction to what I know.
> 
> From what I understand, elements transmitted between two different
> transforms are serialized/deserialized.
> 
> This (de)serialization process is normally managed by Coder, in which
> the most used is obviously the Serializablecoder, which takes a
> serializable object and (de)serialize it using classical java mechanisms.
> 
> On direct runner, i had issues with rabbitMq messages, as they contain
> in their headers objects that are LongString, an interface implemented
> solely in a private static class of RabbitMq, and used for large text
> messages.
> 
> So I wrote a RabbitMqMessageCoder, and installed it in my pipeline
> (using
> pipeline.getCoderregistry().registerCoderForClass(RabbitMqMessage.class,
> new MyCoder())
> 
> And it worked ! well, not in dataflow runner.
> 
> 
> indeed, it seems like dataflow runner don't use this coder registry
> mechanism (for reasons I absolutely don't understand).
> 
> So my fix didn't work.
> 
> After various tries, I finally gave up and directly modified the
> RabbitMqIO class (from Apache Beam) to handle my case.
> 
> This fix is available on my Beam fork on GitHub, and i would like to
> have it integrated.
> 
> What is the procedure to do so ?
> 
> Thanks !
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: a fix to send RabbitMq messages

2019-05-24 Thread Alexey Romanenko
Hi Nicolas,

Thank you for your contribution. I’d recommend you to start from this page:
https://beam.apache.org/contribute/#share-your-intent 


I’ll be glad to review your PR once it will be ready.

> On 24 May 2019, at 11:01, Nicolas Delsaux  wrote:
> 
> Hi all
> 
> I'm currently evaluationg Apache Beam to transfer messages from RabbitMq
> to kafka with some transform in between.
> 
> Doing, so, i've discovered some differences between direct runner
> behaviour and Google Dataflow runner.
> 
> But first, a small introduction to what I know.
> 
> From what I understand, elements transmitted between two different
> transforms are serialized/deserialized.
> 
> This (de)serialization process is normally managed by Coder, in which
> the most used is obviously the Serializablecoder, which takes a
> serializable object and (de)serialize it using classical java mechanisms.
> 
> On direct runner, i had issues with rabbitMq messages, as they contain
> in their headers objects that are LongString, an interface implemented
> solely in a private static class of RabbitMq, and used for large text
> messages.
> 
> So I wrote a RabbitMqMessageCoder, and installed it in my pipeline
> (using
> pipeline.getCoderregistry().registerCoderForClass(RabbitMqMessage.class,
> new MyCoder())
> 
> And it worked ! well, not in dataflow runner.
> 
> 
> indeed, it seems like dataflow runner don't use this coder registry
> mechanism (for reasons I absolutely don't understand).
> 
> So my fix didn't work.
> 
> After various tries, I finally gave up and directly modified the
> RabbitMqIO class (from Apache Beam) to handle my case.
> 
> This fix is available on my Beam fork on GitHub, and i would like to
> have it integrated.
> 
> What is the procedure to do so ?
> 
> Thanks !
>