Hi,
As you know, a lot of work this year went into performance optimization of Pig. One of the main sources of performance problems is high memory usage. In an effort to address this problem we propose switching internal implementation of strings from Java Strings to Hadoop Text because text has lower memory overhead. Examples (assumes ASCII data; sizes are in bytes): Real String Java String Hadoop Text 5 46 37 10 56 42 20 76 52 40 116 72 80 196 112 As the size of the strings grows so does the gap between the two implementations. Making this change would have no impact on pig users; however, it will have impact on existing UDFs that work with Strings. Our question is whether UDF writers/owners are comfortable with the proposed transition and will update their UDFs. Please, let us know by the end of next week if you strongly object to this proposal. Otherwise, we will go forward with this plan. Thanks, Olga