Re: UDFContext NULL JobConf

2013-10-31 Thread Henning Kropp
Sure, just did https://issues.apache.org/jira/browse/PIG-3554

Kind regards


2013/10/30 Cheolsoo Park piaozhe...@gmail.com

 Indeed. Do you mind filing a jira?
 https://issues.apache.org/jira/browse/PIG


 On Wed, Oct 30, 2013 at 11:34 AM, Henning Kropp henning.kr...@gmail.com
 wrote:

  Wow, I was pretty sure to have tested it, but apparently not. The reason
  for the NPE seems to be the @MonitoredUDF
 
  You can reproduced it with the following code:
 
  import java.io.IOException;
  import java.util.concurrent.TimeUnit;
  import org.apache.hadoop.conf.Configuration;
  import org.apache.pig.EvalFunc;
  import org.apache.pig.builtin.MonitoredUDF;
  import org.apache.pig.data.Tuple;
  import org.apache.pig.impl.util.UDFContext;
 
  @MonitoredUDF(timeUnit = TimeUnit.HOURS, duration = 3, intDefault = 1)
  public class TestJobConf extends EvalFuncTuple {
 
  public TestJobConf() {
  }
 
  @Override
  public Tuple exec(Tuple tuple) throws IOException {
  if (tuple == null || tuple.size() == 0) {
  return null;
  }
  Configuration jobConf;
  jobConf = UDFContext.getUDFContext().getJobConf();
  System.err.println(jobConf.toString());
  return null;
  }
 
  }
 
  Log from the Task:
 
  2013-10-30 19:24:09,516 ERROR TestJobConf:
  java.util.concurrent.ExecutionException:
  java.lang.NullPointerException
  2013-10-30 19:24:10,388 ERROR TestJobConf:
  java.util.concurrent.ExecutionException:
  java.lang.NullPointerException
  2013-10-30 19:24:10,880 ERROR TestJobConf:
  java.util.concurrent.ExecutionException:
  java.lang.NullPointerException
  2013-10-30 19:24:11,512 ERROR TestJobConf:
  java.util.concurrent.ExecutionException:
  java.lang.NullPointerException
  2013-10-30 19:24:12,264 ERROR TestJobConf:
  java.util.concurrent.ExecutionException:
  java.lang.NullPointerException
 
 
  Without the MonitoredUDF Annotation the log output is:
 
  Configuration: core-default.xml, core-site.xml, mapred-default.xml,
  mapred-site.xml, hdfs-default.xml, hdfs-site.xml, .
 
  Seems like a bug.
 
 
  2013/10/30 Pradeep Gollakota pradeep...@gmail.com
 
   Are you able to post your UDF (or at least a sanitized version)?
  
  
   On Wed, Oct 30, 2013 at 10:46 AM, Henning Kropp 
 henning.kr...@gmail.com
   wrote:
  
Hi,
   
thanks for your reply. I read about the expected behavior on the
   front-end
and I am getting the NPE on the back-end. The Mappers log the
 Exception
during Execution.
   
I am currently digging through debug messages. What to look out for?
   There
are bunch of
   
[main] DEBUG org.apache.hadoop.conf.Configuration  -
  java.io.IOException:
config()
   
log messages. But I recall them as being normal for reasons I don't
remember.
   
Regards
   
   
2013/10/30 Cheolsoo Park piaozhe...@gmail.com
   
 Hi,

 Are you getting NPE on the front-end or the back-end? Sounds like
   jobConf
 is not added to UDFContext, which is expected on the front-end.
  Please
see
 the comments in getJobConf() and addJobConf() in the source code:



   
  
 
 https://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/UDFContext.java

 Thanks,
 Cheolsoo


 On Wed, Oct 30, 2013 at 9:57 AM, Henning Kropp 
   henning.kr...@gmail.com
 wrote:

  Hi,
 
  I am stuck. In my UDF (Java) extends EvalFunc the following code
   throws
 and
  NPE in exec(), when executed in -x mapred mode:
 
  Configuration jobConf = UDFContext.getUDFContext().getJobConf();
  System.err.println(jobConf.toString());
 
  I did not find any useful information as why my JobConf is always
   null.
 All
  I find is that this is the right way to get the JobConf in a UDF
  and
that
  the behavior of what is returned when running locally (jira
 issue).
 
  Any ideas? I am running it on a very old Hadoop version 0.20.2
 Are
there
  some known issues? I use Pig 0.11.1
 
  Many thanks in advanced
 
  PS: Just found someone with the same issue
 
   
  http://stackoverflow.com/questions/18795008/accessing-hdfs-from-pig-udf
 

   
  
 



simple pig logic

2013-10-31 Thread jamal sasha
Hi,
 I have two datasets..
main_data.txt
{id:foo, some_field:12354, score:0}
{id:foobar, some_field:12354, score:0}


score_data.txt
{id:foo, score:1}
{id:foobar,score:20}



So in main_data.. score is initialized to 0..
Also.. main_data and score_data have some ids in common..

For the ids which are common:
I want to replace score in main_data with score in score_data

And if the element is absent.. then I want to let the score to 0 itself..


UDFContext NULL JobConf

2013-10-31 Thread Henning Kropp
Hi,

in my UDF (Java) the following code throws and NPE, when executed in -x
mapred mode:

Configuration jobConf = UDFContext.getUDFContext().getJobConf();
System.err.println(jobConf.toString());

I did not find any useful information as why my JobConf is always null. All
I find is that this is the right way to get the JobConf in a UDF and that
the behavior of what is returned when running locally (jira issue).

Any ideas? I am running it on a very old Hadoop version 0.20.2 Are there
some known issues?

Many thanks in advanced


Re: simple pig logic

2013-10-31 Thread Pradeep Gollakota
If I understood your question correctly, given the following input:

main_data.txt
{id: foo, some_field: 12354, score: 0}
{id: foobar, some_field: 12354, score: 0}
{id: baz, some_field: 12345, score: 0}

score_data.txt
{id: foo, score: 1}
{id: foobar, score: 20}

you want the following output

{id: foo, some_field: 12354, score: 1}
{id: foobar, some_field: 12354, score: 20}
{id: baz, some_field: 12345, score: 0}

If that is correct, you can do a LEFT OUTER join on the two relations.

main = LOAD 'main_data.txt' as (id: chararray, some_field: int, score: int);
scores = LOAD 'score_data.txt' as (id: chararray, score: int);
both = JOIN main by id LEFT, scores by id;
final = FOREACH both GENERATE main::id as id, main::some_field as
some_field, (scores::score == null ? main::score : scores::score) as
score;
dump final;

After the join, check to see if the scores::score is null… if it is, choose
the default of main::score… if not choose scores::score.

Hope this helps!