[ 
https://issues.apache.org/jira/browse/PIG-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777408#comment-13777408
 ] 

Jacob Perkins commented on PIG-3453:
------------------------------------

[~boneill], I haven't thought too hard about distinct yet myself. Since I'm 
really only thinking about Trident and not storm in general, doing a distinct 
strictly within a batch is one straightforward option. Unfortunately, from a 
user standpoint, I think this would be (a) minimally useful and (b) confusing. 
Instead we could implement something like an approximate distinct using an LRU 
cache? Maybe even go so far as to implement a SQF (which I haven't read in its 
entirety yet): http://www.vldb.org/pvldb/vol6/p589-dutta.pdf?

Also, what about order by? In what sense is an unbounded stream ordered?

I absolutely do not want to tie the storm/trident execution engine to an 
external data store such as cassandra. Pig is supposed to be backend agnostic. 
Maybe the -default- tap and sink can be Kafka (tap) and Cassandra (sink). 
Finally, it should be possible to run a pig script in storm local mode.

And [~pradeepg26] I'm actually well on the way to having nested foreach 
working. They way I'm working it now is each LogicalExpressionPlan becomes its 
own Trident BaseFunction. Actually works quite nicely for now. I haven't gotten 
to aggregates yet. What I probably won't implement for the POC is the tap and 
sink.
                
> Implement a Storm backend to Pig
> --------------------------------
>
>                 Key: PIG-3453
>                 URL: https://issues.apache.org/jira/browse/PIG-3453
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Pradeep Gollakota
>              Labels: storm
>
> There is a lot of interest around implementing a Storm backend to Pig for 
> streaming processing. The proposal and initial discussions can be found at 
> https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to