[ 
https://issues.apache.org/jira/browse/PIG-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812312#comment-13812312
 ] 

Mridul Jain commented on PIG-3453:
----------------------------------

We have been planning to do that comparison, but it would help if someone has 
already done any benchmarking. 
Just last week I was trying to map our problem statement already expressed in 
PIG, to trident. We are trying to do two things at a high level: express Storm 
topology via PIG...secondly, have a mixed mode where we can express 
online(storm)-offline(map-reduce) jobs via the same PIG scripts and pass data 
between the two...as many of our systems require offline processing of data 
(long running ML jobs) in addition to fastpath processing via Storm, for the 
incoming data. We do have some literature/arch worked-out for the same, which 
also tries to maintain semantic as well as syntactic compatibility with 
existing PIG. 
For all the above, we plan to convert from PIG to Trident directly due to the 
following  reasons:
* Trident supports batch processing (which was available via transactional 
topologies in older version of Storm; but now has been subsumed via Trident) 
which should provide high throughput for the whole pipeline than tuple-by-tuple 
processing in vanilla Storm. Also certain ops like merge, join etc seem to 
match batch semantics, in general, naturally with Trident.
* Trident semantics seems to fit in very well with traditional PIG UDFs....as 
someone pointed out above...it can be easily converted from/to existing PIG 
UDFs.
* Only thing is, Trident doesn't support multiple output streams to express 
rich connections (i.e topologies which need complex workflows/DAGs) though I 
had filed a bug for the same (as Nathan had asked me 
to):https://groups.google.com/forum/#!searchin/storm-user/mridul/storm-user/G8POD1Hb89I/hcGYH1nf230J
Anyway, PIG also supports only linear chains and so I suppose the above is not 
a problem as far as programming model goes.

But if anyone finds potential pitfalls in converting to Trident over Vanilla 
Storm, please do point out.

Mridul

> Implement a Storm backend to Pig
> --------------------------------
>
>                 Key: PIG-3453
>                 URL: https://issues.apache.org/jira/browse/PIG-3453
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.13.0
>            Reporter: Pradeep Gollakota
>            Assignee: Jacob Perkins
>              Labels: storm
>             Fix For: 0.13.0
>
>         Attachments: storm-integration.patch
>
>
> There is a lot of interest around implementing a Storm backend to Pig for 
> streaming processing. The proposal and initial discussions can be found at 
> https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to