[jira] [Commented] (PIG-2421) EvalFuncs need redesigned

2012-08-30 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445402#comment-13445402
 ] 

Raghu Angadi commented on PIG-2421:
---


 # +1 for making a context available (current UDFContext is not available for 
UDFs).
#* use case: I want to be able to this write UDF 'NullIfMissing()' define 
this way: {code}
a = load 'input' as (p:(one, two, three), q:int);
b = foreach a generate NullIfMissing(p);
describe b;
{t: (one: bytearray, two: bytearray, three: bytearray)}
-- NullIfMissing Returns 
-- (null, null, null) if 'p' is null
-- (x, y, z), if p == (x, y, z)
-- (x, y, null) if p == (x, y)
{code}
# making conf available (readonly is sufficient, and probably preferred since a 
UDF context can used to store any state).


> EvalFuncs need redesigned
> -
>
> Key: PIG-2421
> URL: https://issues.apache.org/jira/browse/PIG-2421
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.11
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: examples.patch, PIG-newudf.patch
>
>
> The current EvalFunc interface (and associated Algebraic and Accumulator 
> interfaces) have grown unwieldy.  In particular, people have noted the 
> following issues:
> # Writing a UDF requires a lot of boiler plate code.
> # Since UDFs always pass a tuple, users are required to manage their own type 
> checking for input.
> # Declaring schemas for output data is confusing.
> # Writing a UDF that accepts multiple different parameters (using 
> getArgToFuncMapping) is confusing.
> # Using Algebraic and Accumulator interfaces often entails duplicating code 
> from the initial implementation.
> # UDF implementors are exposed to the internals of Pig since they have to 
> know when to return a tuple (Initial, Intermediate) and when not to (exec, 
> Final).
> # The separation of Initial, Intermediate, and Final into separate classes 
> forces code duplication and makes it hard for UDFs in other languages to use 
> those interfaces.
> # There is unused code in the current interface that occasionally causes 
> confusion (e.g. isAsynchronous)
> Any change must be done in a way that allows existing UDFs to continue 
> working essentially forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2896) Pig does not fail anymore if two macros are declared with the same name

2012-08-30 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-2896.


   Resolution: Fixed
Fix Version/s: 0.11

> Pig does not fail anymore if two macros are declared with the same name
> ---
>
> Key: PIG-2896
> URL: https://issues.apache.org/jira/browse/PIG-2896
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.11
>
> Attachments: PIG-2896.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2896) Pig does not fail anymore if two macros are declared with the same name

2012-08-30 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445382#comment-13445382
 ] 

Dmitriy V. Ryaboy commented on PIG-2896:


+1

> Pig does not fail anymore if two macros are declared with the same name
> ---
>
> Key: PIG-2896
> URL: https://issues.apache.org/jira/browse/PIG-2896
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2896.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2899) PigServer batch mode , pig scripts and parameters

2012-08-30 Thread srinivas (JIRA)
srinivas created PIG-2899:
-

 Summary: PigServer batch mode , pig scripts and parameters
 Key: PIG-2899
 URL: https://issues.apache.org/jira/browse/PIG-2899
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.9.2
Reporter: srinivas
 Fix For: 0.9.3


We are using PigServer directly to run pig scripts as part of workflow.

It looks like setbatchmodeon(true) can't be used with this 
registerScript(String fileName, Map params).

I am not sure why the pigserver implementation doesnt match grunt shell 
implementation.

Also you need to call pigserver.store method.

This makes MultiQuery option not possible with PigServer with registerPigScript?





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2896) Pig does not fail anymore if two macros are declared with the same name

2012-08-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445272#comment-13445272
 ] 

Julien Le Dem commented on PIG-2896:


* TestPigServerWithMacros (containing tests added in PIG-2850) still passes
* TestMacroExpansion (broken by PIG-2850) now passes

> Pig does not fail anymore if two macros are declared with the same name
> ---
>
> Key: PIG-2896
> URL: https://issues.apache.org/jira/browse/PIG-2896
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2896.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2896) Pig does not fail anymore if two macros are declared with the same name

2012-08-30 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445195#comment-13445195
 ] 

Julien Le Dem commented on PIG-2896:


PIG-2279 is still open and the source of the problem still has to be found.

> Pig does not fail anymore if two macros are declared with the same name
> ---
>
> Key: PIG-2896
> URL: https://issues.apache.org/jira/browse/PIG-2896
> Project: Pig
>  Issue Type: Bug
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-2896.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2898) Multithreaded execution of e2e tests

2012-08-30 Thread Andrey Klochkov (JIRA)
Andrey Klochkov created PIG-2898:


 Summary: Multithreaded execution of e2e tests
 Key: PIG-2898
 URL: https://issues.apache.org/jira/browse/PIG-2898
 Project: Pig
  Issue Type: Improvement
  Components: e2e harness
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov


Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The 
bottleneck here is the client side, and per our observations it can help a lot 
if the e2e harness would be able to run tests in parallel threads.

We prototyped changes in e2e harness allowing to run tests in a configurable 
number of threads. Preliminary results show more than 6x reduction in execution 
time when using a small 3-nodes M/R cluster with modest configuration. Going to 
share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2897) Code coverage calculation for e2e tests

2012-08-30 Thread Andrey Klochkov (JIRA)
Andrey Klochkov created PIG-2897:


 Summary: Code coverage calculation for e2e tests
 Key: PIG-2897
 URL: https://issues.apache.org/jira/browse/PIG-2897
 Project: Pig
  Issue Type: Improvement
  Components: e2e harness
Reporter: Andrey Klochkov
Assignee: Andrey Klochkov


Additionally to the unit tests coverage it'd be useful to have coverage 
analyzed for e2e tests. In particular, this would allow to find areas of code 
which are not touched by any tests.

We have a working prototype which is Clover based and works both for local and 
mapred modes. We'll share a patch after some clean up in the changes. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

2012-08-30 Thread Eli Reisman (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445156#comment-13445156
 ] 

Eli Reisman commented on PIG-1891:
--

I can try to avoid the re-instantiation if you like, or bump the test value, 
whatever is best. And you're comfortable the other test issue is something 
else? This passed the test suite for me that that was a while back, and I'm not 
extremely knowledgeable on all the areas of the code I'm touching here. Hope to 
be soon :)


> Enable StoreFunc to make intelligent decision based on job success or failure
> -
>
> Key: PIG-1891
> URL: https://issues.apache.org/jira/browse/PIG-1891
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Alex Rovner
>Priority: Minor
>  Labels: patch
> Attachments: PIG-1891-1.patch, PIG-1891-2.patch
>
>
> We are in the process of using PIG for various data processing and component 
> integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem 
> for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank 
> (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
>  and what I see is essentially a mechanism which for each task does the 
> following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a 
> job level. 
> If certain tasks will succeed but over job will fail, partial records are 
> going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that 
> launches pig jobs and then uploads to DB's once pig's job is successful. 
> While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2895) jodatime jar missing in pig-withouthadoop.jar

2012-08-30 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-2895:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed to trunk

> jodatime jar missing in pig-withouthadoop.jar
> -
>
> Key: PIG-2895
> URL: https://issues.apache.org/jira/browse/PIG-2895
> Project: Pig
>  Issue Type: Sub-task
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.11
>
> Attachments: PIG-2895.1.patch, PIG-2895.2.patch
>
>
> jodatime jar is missing in pig-withouthadoop.jar. When an external hadoop.jar 
> is used, pig will fail with class not found error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


PIG to load a Map Type

2012-08-30 Thread Venkatraman, Jagadish
Hi Pig Users/Dev,

We are using PIG for largescale log analysis of our server logs.
Our log records are of the format,
instance_name, host_name, processing_time, 
[component1#1,component2#322,component3]

A line in our CSV file looks like,
Instance1,host1.ms.com,323,[component1#22,componentprocessor#33,third_component#299]


We are attempting to load the data as  type.

A snapshot from our pig script looks like,

A = LOAD 'dlink_data.pig' using PigStorage(',') AS 
(instance_name:chararray,host_name:chararray,processing_time:int, 
components_map:map[]);
DUMP A;

The first 3 fields load correctly.  The Map, however, does not get loaded.
Can someone please advise on this?



Jagadish Venkatraman
Data Architecture Group
Morgan Stanley | IM Technology
jagadish.venkatra...@morganstanley.com


--
NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or 
views contained herein are not intended to be, and do not constitute, advice 
within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and 
Consumer Protection Act. If you have received this communication in error, 
please destroy all electronic and paper copies and notify the sender 
immediately. Mistransmission is not intended to waive confidentiality or 
privilege. Morgan Stanley reserves the right, to the extent permitted under 
applicable law, to monitor electronic communications. This message is subject 
to terms available at the following link: 
http://www.morganstanley.com/disclaimers. If you cannot access these links, 
please notify us by reply message and we will send the contents to you. By 
messaging with Morgan Stanley you consent to the foregoing.