Re: How to find input file associated with failed map task?

2011-02-17 Thread Charles Gonçalves
Hi Scott, I work with lots of gzipped files also and sometimes I used to get the same error. I started checking the gzip files before processing them. In fact I check immediately after I put them on hdfs. What I do is a cat of the gzip file and check it with the gzip -t. For example, all the

Re: Problems with union, projection producing unexpected results

2011-02-17 Thread Jonathan Coveney
I am glad that you got this in a replicatable form! I have seen this error as well (where the output is just the last value repeated instead of the multiples that you want), but wasn't able to give a concrete example. 2011/2/16 James Kebinger jkebin...@gmail.com Hello all, I've been scratching

JSON Loading on EMR

2011-02-17 Thread Eric Lubow
Hello, I'll preface this with saying that I know very very little Java and I am just learning Pig. My situation is that I am aggregating logs with Flume into a single logfile. All my logs are in JSON format and then gzip'd before being added to S3. I have 3 types of log lines in each

Has anyone run into problems with filter simply not working?

2011-02-17 Thread Jonathan Coveney
This is weird, because in my case it seems to be nondeterministic. I have a text file, thing.txt, that is simply http://www.guardian.co.uk/ asjlkdajlkdad askjldajlksdjlkasjdlkajslkdjalds asdjaskdjlasjdlkad http://www.guardian.co.uk/adsasd http://www.guardian.co.uk/sadasd

Re: Problems with union, projection producing unexpected results

2011-02-17 Thread James Kebinger
Interesting, maybe I should file a bug report then? On Thu, Feb 17, 2011 at 10:41 AM, Jonathan Coveney jcove...@gmail.comwrote: I am glad that you got this in a replicatable form! I have seen this error as well (where the output is just the last value repeated instead of the multiples that

Re: Problems with union, projection producing unexpected results

2011-02-17 Thread James Kebinger
https://issues.apache.org/jira/browse/PIG-1859 On Thu, Feb 17, 2011 at 12:32 PM, James Kebinger jkebin...@gmail.comwrote: Interesting, maybe I should file a bug report then? On Thu, Feb 17, 2011 at 10:41 AM, Jonathan Coveney jcove...@gmail.comwrote: I am glad that you got this in a

Quick question about Reading dirs

2011-02-17 Thread Charles Gonçalves
Guys, Does Pig read the _log directories from an output script ? What I want is to read an pig output dir (or multiples) from pig scripts. But I just want the part- files not the .part-crc or _logs files. Thanks -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG

Re: Quick question about Reading dirs

2011-02-17 Thread Richard Ding
Files starting with . are also ignored. -Richard On 2/17/11 3:23 PM, Ramesh, Amit amram...@amazon.com wrote: Directory names starting with underscores are ignored, but I am not certain about .* files/directories. Amit On 2/17/11 3:12 PM, Charles Gonçalves charles...@gmail.com wrote: