[
https://issues.apache.org/jira/browse/PIG-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-5323:
------------------------------------
Status: Patch Available (was: Open)
> Implement LastInputStreamingOptimizer in Tez
> --------------------------------------------
>
> Key: PIG-5323
> URL: https://issues.apache.org/jira/browse/PIG-5323
> Project: Pig
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
> Attachments: PIG-5323-1.patch
>
>
> http://pig.apache.org/docs/r0.17.0/perf.html#join-optimizations
> {quote}
> Optimization for regular joins ensures that the last table in the join is not
> brought into memory but streamed through instead. Optimization reduces the
> amount of memory used which means you can avoid spilling the data and also
> should be able to scale your query to larger data volumes.
> To take advantage of this optimization, make sure that the table with the
> largest number of tuples per key is the last table in your query. In some of
> our tests we saw 10x performance improvement as the result of this
> optimization.
> {quote}
> We are not doing that in Tez and both the tables are materialized as
> InternalCachedBag.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)