Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by DavidPhillips: http://wiki.apache.org/pig/FAQ The comment on the change is: this page is old, the new one is PigFaq ------------------------------------------------------------------------------ - Pig FAQ + deleted - 1. I'm using PigStorage to parse my input files. Can I make it use control characters as delimiters? - - A. Yes. Examples: PigStorage('\u0001') for Ctrl+A or '\u007C' for this character: | - - 2. Can I do a numerical comparison while filtering? - - A. Yes, you can choose between numerical and string comparison. For numerical comparison use the operators =, <>, < etc. and for string comparisons use eq, neq etc. - - 3. How do I make my jobs run on multiple machines? - - A. Use the PARALLEL clause. For example =C = JOIN A by url, B by url PARALLEL 50= - - 4. Does Pig support NULLs? - - A. Pig currently has no support for NULL values but it is on the roadmap. - - 5. Does pig support regular expressions? - - A. Pig does support regular expression matching via =matches= keyward. Tt uses java.util.regexp matches which means your pattern has to match the entire string (ie if your string is "hi fred" and you want to find "fred" you have to give a pattern of ".*fred" not "fred"). - - 6. How to prevent failure if some records don't have the needed number of columns. - - You can filter away those records by including the following in your Pig program: - - - A = load 'foo' using PigStorage('\t'); - B = FILTER A BY ARITY(*) < 5; - ..... - - - This code would drop all the records that has less than 5 columns. - - 7. Is there any difference between == and eq for numeric comparisons? - - For equality, there is no difference while you stay in integers. However 11.0 and 11 will be equal with == but not with eq. - - 8. Is there an easy way for me to figure out how many rows exists in a dataset from its alias? - - You can run the following set of commands: - - - a = load 'bla' ... ; - - b = group a all; - - c = foreach b generate COUNT(a.$0); - - - This is equivalent to select count(*) in SQL. - - 9. Does Pig allow grouping on expressions - - Currently, Pig only allows to group on data fields rather than expressions. Allowing grouping on expressions is on our road map. Stay tuned! -