[jira] Updated: (PIG-1016) Reading in map data seems broken

2009-10-16 Thread hc busy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hc busy updated PIG-1016:
-

Attachment: (was: PIG-1016.patch)

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-16 Thread Raghu Angadi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766841#action_12766841
 ] 

Raghu Angadi commented on PIG-993:
--


I think the test needs to be fixed.  It deletes 6 column groups from 6 
different threads. The spec explicitly states read accesses and parallel 
deletions expected to fail. But the table is always left in consistent state. 
The rationale for this is that in practice these tables are accessed from 
different machines and it would add unnecessary complication to support 
coordinate all the readers and the writers (especially with no locking support 
on HDFS). Zebra tables have no state outside the directory. This applies to 
writing as well.

One options I see is to make each thread make multiple attempts in case of 
errors. 
  

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, zebra-drop-cg.patch, 
> zebra-drop-cg.patch, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-644) Duplicate column names in foreach do not throw parser error

2009-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766839#action_12766839
 ] 

Hadoop QA commented on PIG-644:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422324/PIG-644-1.patch
  against trunk revision 826110.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 307 release audit warnings 
(more than the trunk's current 305 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/91/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/91/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/91/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/91/console

This message is automatically generated.

> Duplicate column names in foreach do not throw parser error
> ---
>
> Key: PIG-644
> URL: https://issues.apache.org/jira/browse/PIG-644
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: blah.txt, PIG-644-1.patch
>
>
> Consider the following Pig script where we generate column names b and b in 
> the FOREACH
> {code}
> DATA = LOAD 'blah.txt' as (a:long, b:long);
> RESULT = FOREACH DATA GENERATE a, b, (b>20?b:0) as b;
> DESCRIBE RESULT;
> dump RESULT;
> {code}
> Pig runs the script successfully and does not complain of the duplicate 
> column names.  I do not know if the new error handling framework will handle 
> these kinds of cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-16 Thread Kevin Weil (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766838#action_12766838
 ] 

Kevin Weil commented on PIG-1025:
-

I very much agree that the test case is weak.  I followed the model for the 
rest of the grunt tests, which are similarly weak :) 

> Should be able to set job priority through Pig Latin
> 
>
> Key: PIG-1025
> URL: https://issues.apache.org/jira/browse/PIG-1025
> Project: Pig
>  Issue Type: New Feature
>  Components: grunt
>Affects Versions: 0.4.0
>Reporter: Kevin Weil
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: PIG-1025.patch
>
>
> Currently users can set the job name through Pig Latin by saying
> set job.name 'my job name'
> The ability to set the priority would also be nice, and the patch should be 
> small.  The goal is to be able to say
> set job.priority 'high'
> and throw a JobCreationException in the JobControlCompiler if the priority is 
> not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
> very_low, low, normal, high, very_high.   Case insensitivity makes this a 
> little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-944) Zebra schema is taken from Pig through TableStorer's construct

2009-10-16 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou reassigned PIG-944:


Assignee: Yan Zhou  (was: Ying He)

> Zebra schema is taken from Pig through TableStorer's construct
> --
>
> Key: PIG-944
> URL: https://issues.apache.org/jira/browse/PIG-944
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Yan Zhou
> Fix For: 0.6.0
>
> Attachments: SchemaConversion.patch, SchemaConversion.patch
>
>
> It should be from StoreConfig in TableOutputFormat.checkOutputSpecs method 
> because the information is dynamic in Pig's execution engine and should not 
> be taking a static argument to the constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766806#action_12766806
 ] 

Alan Gates commented on PIG-928:


A couple thoughts:

1) I still have to figure out how to do type translation in BSF.  The current 
patch just assumes one string argument and then does reflection on the fly on 
return to figure out what it is returning.  We may or may not be able to expose 
schemas to scripted UDFs (ala outputSchema and argToFuncMapping) but we at 
least need to handle multiple and non-string arguments.  I need to do more 
digging in order to understand how to do this type translation in BSF.

2) For at least some either jython or jruby we've got to show better than a 30x 
differential.  There are some products you're just too embarrassed to sell.  We 
may be able to speed this up some by having the framework figure out the return 
type for this UDF and always convert the returning object based on that return 
type rather than trying to do reflection.

I don't know ruby or python, and I don't have time at the moment to go learn 
either.  If someone is willing to give me snippets of python and/or ruby that 
mimic the split functionality given in the patch, I'm happy to test against 
those two in BSF and see what happens.

> UDFs in scripting languages
> ---
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
> Attachments: package.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-944) Zebra schema is taken from Pig through TableStorer's construct

2009-10-16 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-944:
-

Assignee: Ying He  (was: Yan Zhou)
  Status: Patch Available  (was: Open)

> Zebra schema is taken from Pig through TableStorer's construct
> --
>
> Key: PIG-944
> URL: https://issues.apache.org/jira/browse/PIG-944
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Yan Zhou
>Assignee: Ying He
> Fix For: 0.6.0
>
> Attachments: SchemaConversion.patch, SchemaConversion.patch
>
>
> It should be from StoreConfig in TableOutputFormat.checkOutputSpecs method 
> because the information is dynamic in Pig's execution engine and should not 
> be taking a static argument to the constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-trunk #592

2009-10-16 Thread Apache Hudson Server
See 

Changes:

[gates] PIG-993 Ability to drop a column group in a table.

[gates] PIG-858: Order By followed by "replicated" join fails while compiling 
MR-plan from physical plan.

[daijy] PIG-1020: Include an ant target to build pig.jar without hadoop 
libraries

--
[...truncated 2547 lines...]

ivy-init-dirs:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: 
org.apache.pig#Pig;2009-10-17_00-27-30
[ivy:resolve]   confs: [buildJar]
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve]   found jline#jline;0.9.94 in maven2
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve]   found junit#junit;4.5 in default
[ivy:resolve] :: resolution report :: resolve 59ms :: artifacts dl 4ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
| buildJar |   4   |   0   |   0   |   0   ||   4   |   0   |
-
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve]  confs: [buildJar]
[ivy:retrieve]  1 artifacts copied, 3 already retrieved (288kB/4ms)

buildJar:
 [echo] svnString 826142
  [jar] Building jar: 

 [copy] Copying 1 file to 


jarWithOutSvn:

findbugs:
[mkdir] Created dir: 

 [findbugs] Executing findbugs from ant task
 [findbugs] Running FindBugs...
 [findbugs] The following classes needed for analysis were missing:
 [findbugs]   com.jcraft.jsch.SocketFactory
 [findbugs]   com.jcraft.jsch.Logger
 [findbugs]   jline.Completor
 [findbugs]   com.jcraft.jsch.Session
 [findbugs]   com.jcraft.jsch.HostKeyRepository
 [findbugs]   com.jcraft.jsch.JSch
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   jline.ConsoleReaderInputStream
 [findbugs]   com.jcraft.jsch.HostKey
 [findbugs]   jline.ConsoleReader
 [findbugs]   com.jcraft.jsch.ChannelExec
 [findbugs]   jline.History
 [findbugs]   com.jcraft.jsch.ChannelDirectTCPIP
 [findbugs]   com.jcraft.jsch.JSchException
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs] Warnings generated: 392
 [findbugs] Missing classes: 16
 [findbugs] Calculating exit code...
 [findbugs] Setting 'missing class' flag (2)
 [findbugs] Setting 'bugs found' flag (1)
 [findbugs] Exit code set to: 3
 [findbugs] Java Result: 3
 [findbugs] Classes needed for analysis were missing
 [findbugs] Output saved to 

 [xslt] Processing 

 to 

 [xslt] Loading stylesheet 
/homes/gkesavan/tools/findbugs/latest/src/xsl/default.xsl

BUILD SUCCESSFUL
Total time: 2 minutes 49 seconds
+ mv build/pig-2009-10-17_00-27-30.tar.gz 

+ mv build/test/findbugs 

+ mv build/docs/api 

+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant clean
Buildfile: build.xml

clean:
   [delete] Deleting directory 

   [delete] Deleting directory 

   [delete] Deleting directory 

   [delete] Deleting directory 


BUILD SUCCESSFUL
Total time: 0 seconds
+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant 
-Dtest.junit.output.format=xml -Dtest.output=yes 
-Dcheckstyle.home=/homes/hudson/tools/checkstyle/latest -Drun.clover=true 
-Dclover.home=/homes/hudson/tools/clover/clover-ant-2.3.2 clover test 
generate-clover-reports
Buildfile: build.xml

clover.setup:
[mkdir] Created dir: 

[clover-setup] Clover Version 2.3.2, built on July 15 2008 (build-732)
[clover-setup] Loaded from: 
/homes/hudson/tools/clover/clover-ant-2.3.2/lib/clover.jar
[clover-setup] Clover: Open Source License registered to Apache Software 
Foundation.
[clover-set

[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766787#action_12766787
 ] 

Hadoop QA commented on PIG-1016:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422303/PIG-1016.patch
  against trunk revision 826047.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/90/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/90/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/90/console

This message is automatically generated.

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766774#action_12766774
 ] 

Ashutosh Chauhan commented on PIG-928:
--

Right, I overlooked it. I think Ruby and Python are two most widely used 
scripting languages and both are supported by BSF. So, comparing BSF with 
direct bindings:
1) Performance : Initial test shows almost equal.
2) Support of multiple languages.
3) Ease of implementation 
To me, BSF seems to be the way to go for this, atleast the first-cut. 
Implementing this feature using BSF will allow us to expose this to users 
quickly and if many people are using it and finding one particular language to 
be slow then we can explore language bindings for that particular language. 
Thoughts?

> UDFs in scripting languages
> ---
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
> Attachments: package.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766771#action_12766771
 ] 

Ashutosh Chauhan commented on PIG-1025:
---

Useful feature. Patch looks straightforward. In your test case you are only 
testing whether it parses it correctly or not, I will suggest to also test 
whether priority is actually set in the jobconf or not.

> Should be able to set job priority through Pig Latin
> 
>
> Key: PIG-1025
> URL: https://issues.apache.org/jira/browse/PIG-1025
> Project: Pig
>  Issue Type: New Feature
>  Components: grunt
>Affects Versions: 0.4.0
>Reporter: Kevin Weil
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: PIG-1025.patch
>
>
> Currently users can set the job name through Pig Latin by saying
> set job.name 'my job name'
> The ability to set the priority would also be nice, and the patch should be 
> small.  The goal is to be able to say
> set job.priority 'high'
> and throw a JobCreationException in the JobControlCompiler if the priority is 
> not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
> very_low, low, normal, high, very_high.   Case insensitivity makes this a 
> little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766769#action_12766769
 ] 

Alan Gates commented on PIG-928:


jython was the one I was assuming people would want.

> UDFs in scripting languages
> ---
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
> Attachments: package.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766763#action_12766763
 ] 

Ashutosh Chauhan commented on PIG-928:
--

Though good learning from this test is BSF is not slower then direct bindings 
(need additional verifications though..) So, this feature could be implemented 
in lot less code and complexity using BSF as oppose to using different direct 
bindings for different languages.  On the other hand, only useful language BSF 
supports currently is Ruby. Not sure how many people using Pig will also be 
interested in groovy, javascript etc.( other languages supported by BSF ).

> UDFs in scripting languages
> ---
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
> Attachments: package.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766757#action_12766757
 ] 

Alan Gates commented on PIG-928:


I expected to see the direct bindings to be faster as well, but the tests 
didn't show that.  In the code contributed by Kishore the type translation was 
done the same regardless of the bindings used.  Perhaps there would be a more 
efficient way to do the type translation for direct bindings.  

> UDFs in scripting languages
> ---
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
> Attachments: package.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766750#action_12766750
 ] 

Ashutosh Chauhan commented on PIG-928:
--

30x is indeed too slow. But, between BSF and direct bindings, I imagine direct 
bindings should have been more performant, since BSF adds an extra layer of 
translation. Isn't it ? 

> UDFs in scripting languages
> ---
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
> Attachments: package.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766746#action_12766746
 ] 

Alan Gates commented on PIG-928:


I ran some quick and sloppy performance tests on this.  I ran it using both BSF 
and direct bindings to groovy.  I also ran it using the builtin TOKENIZE 
function in Pig.  I had it read 5000 lines of text.  The groovy (or TOKENIZE) 
functions handle splitting the line, then we do a standard group/count to count 
the words.  I got the following results:

Groovy using BSF:  55.070 seconds
Groovy direct bindings:  58.560 seconds
TOKENIZE:  2.554 seconds

So a 30x slow down using this.  That's pretty painful.  I know string 
translation between languages can be bad.  I don't know how much of this is 
inter-language bindings and how much is groovy.  When i get  chance I'll try 
this in Python and see if I get similar numbers.

> UDFs in scripting languages
> ---
>
> Key: PIG-928
> URL: https://issues.apache.org/jira/browse/PIG-928
> Project: Pig
>  Issue Type: New Feature
>Reporter: Alan Gates
> Attachments: package.zip
>
>
> It should be possible to write UDFs in scripting languages such as python, 
> ruby, etc.  This frees users from needing to compile Java, generate a jar, 
> etc.  It also opens Pig to programmers who prefer scripting languages over 
> Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-993.


Resolution: Fixed

Patch checked in.

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, zebra-drop-cg.patch, 
> zebra-drop-cg.patch, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766732#action_12766732
 ] 

Alan Gates commented on PIG-993:


We looked over the failure info and couldn't understand why it was failing.  
I've rerun the unit tests multiple times since and seen no issue.  We've run it 
on several different machines and not seen an issue.  So I'm going to declare 
the test failure a fluke and commit the patch.

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, zebra-drop-cg.patch, 
> zebra-drop-cg.patch, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1017) Converts strings to text in Pig

2009-10-16 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath reassigned PIG-1017:


Assignee: Sriranjan Manjunath

> Converts strings to text in Pig
> ---
>
> Key: PIG-1017
> URL: https://issues.apache.org/jira/browse/PIG-1017
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text 
> (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
> significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1017) Converts strings to text in Pig

2009-10-16 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-1017:
-

Status: Patch Available  (was: Open)

> Converts strings to text in Pig
> ---
>
> Key: PIG-1017
> URL: https://issues.apache.org/jira/browse/PIG-1017
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text 
> (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
> significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1017) Converts strings to text in Pig

2009-10-16 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-1017:
-

Attachment: stotext.patch

The patch will fail MRCompiler and LogToPhyTransalator unit tests since we need 
to replace the golden files. The rest should pass.

> Converts strings to text in Pig
> ---
>
> Key: PIG-1017
> URL: https://issues.apache.org/jira/browse/PIG-1017
> Project: Pig
>  Issue Type: Improvement
>Reporter: Sriranjan Manjunath
> Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text 
> (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
> significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-16 Thread hc busy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766710#action_12766710
 ] 

hc busy commented on PIG-1016:
--

'kay, since my last comment, I've verified that in trunk, the patch in this 
ticket did not introduce an error. the Skewed join (correct or not) is 
returning the same number of rows when data read in is from a nested data 
structure as data read in from a tuple.

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1026) [zebra] map split returns null

2009-10-16 Thread Jing Huang (JIRA)
[zebra] map split returns null
--

 Key: PIG-1026
 URL: https://issues.apache.org/jira/browse/PIG-1026
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0


Here is the test scenario:
 final static String STR_SCHEMA = "m1:map(string),m2:map(map(int))";
  //final static String STR_STORAGE = "[m1#{a}];[m2#{x|y}]; [m1#{b}, 
m2#{z}];[m1]";
 final static String STR_STORAGE = "[m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, 
m2#{z}];[m1,m2]";

projection: String projection2 = new String("m1#{b}, m2#{x|z}");
User got null pointer exception on reading m1#{b}.

Yan, please refer to the test class:
TestNonDefaultWholeMapSplit.java 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-16 Thread Kevin Weil (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Weil updated PIG-1025:


Attachment: PIG-1025.patch

> Should be able to set job priority through Pig Latin
> 
>
> Key: PIG-1025
> URL: https://issues.apache.org/jira/browse/PIG-1025
> Project: Pig
>  Issue Type: New Feature
>  Components: grunt
>Affects Versions: 0.4.0
>Reporter: Kevin Weil
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: PIG-1025.patch
>
>
> Currently users can set the job name through Pig Latin by saying
> set job.name 'my job name'
> The ability to set the priority would also be nice, and the patch should be 
> small.  The goal is to be able to say
> set job.priority 'high'
> and throw a JobCreationException in the JobControlCompiler if the priority is 
> not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
> very_low, low, normal, high, very_high.   Case insensitivity makes this a 
> little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-16 Thread Kevin Weil (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Weil updated PIG-1025:


Fix Version/s: 0.6.0
   Status: Patch Available  (was: Open)

I just followed the same logic that "set job.name xyz" follows -- a very light 
feature add, but a useful one.

> Should be able to set job priority through Pig Latin
> 
>
> Key: PIG-1025
> URL: https://issues.apache.org/jira/browse/PIG-1025
> Project: Pig
>  Issue Type: New Feature
>  Components: grunt
>Affects Versions: 0.4.0
>Reporter: Kevin Weil
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: PIG-1025.patch
>
>
> Currently users can set the job name through Pig Latin by saying
> set job.name 'my job name'
> The ability to set the priority would also be nice, and the patch should be 
> small.  The goal is to be able to say
> set job.priority 'high'
> and throw a JobCreationException in the JobControlCompiler if the priority is 
> not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
> very_low, low, normal, high, very_high.   Case insensitivity makes this a 
> little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1013) FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on an array

2009-10-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1013:


Attachment: PIG-1013.patch

> FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on an array
> 
>
> Key: PIG-1013
> URL: https://issues.apache.org/jira/browse/PIG-1013
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1013.patch
>
>
> DMI   Invocation of toString on stackTraceLines in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(String[],
>  int)
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToDouble(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToFloat(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToInteger(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToLong(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToMap(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToTuple(byte[])
> DMI   Invocation of toString on args in 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(FuncSpec)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-790:
---

Status: Patch Available  (was: Open)

Looks like a temporal unit test error. Submit again.

> Error message should indicate in which line number in the Pig script the 
> error occured (debugging BinCond)
> --
>
> Key: PIG-790
> URL: https://issues.apache.org/jira/browse/PIG-790
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, 
> pig_1240972895275.log
>
>
> I have a simple Pig script which loads integer data and does a Bincond, where 
> it compares, (col1 eq ''). There is an error message that is generated in 
> this case, but it does not specify the line number in the script. 
> {code}
> MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
> col2:int);
> MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
>  ((col1 neq '') ? col1 - col2 : 
> 16)
> as time_diff;
> dump MYDATA_PROJECT;
> {code}
> ==
> 2009-04-29 02:33:07,182 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 2009-04-29 02:33:08,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
> graph.
> 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
> side:chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
> ==
> It would be good if the error message has a line number and a copy of the 
> line in the script which is causing the problem.
> Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1013) FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on an array

2009-10-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1013:


Status: Patch Available  (was: Open)

> FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on an array
> 
>
> Key: PIG-1013
> URL: https://issues.apache.org/jira/browse/PIG-1013
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1013.patch
>
>
> DMI   Invocation of toString on stackTraceLines in 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getExceptionFromStrings(String[],
>  int)
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToBag(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToDouble(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToFloat(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToInteger(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToLong(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToMap(byte[])
> DMI   Invocation of toString on b in 
> org.apache.pig.builtin.Utf8StorageConverter.bytesToTuple(byte[])
> DMI   Invocation of toString on args in 
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(FuncSpec)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-790:
---

Status: Open  (was: Patch Available)

> Error message should indicate in which line number in the Pig script the 
> error occured (debugging BinCond)
> --
>
> Key: PIG-790
> URL: https://issues.apache.org/jira/browse/PIG-790
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, 
> pig_1240972895275.log
>
>
> I have a simple Pig script which loads integer data and does a Bincond, where 
> it compares, (col1 eq ''). There is an error message that is generated in 
> this case, but it does not specify the line number in the script. 
> {code}
> MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
> col2:int);
> MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
>  ((col1 neq '') ? col1 - col2 : 
> 16)
> as time_diff;
> dump MYDATA_PROJECT;
> {code}
> ==
> 2009-04-29 02:33:07,182 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 2009-04-29 02:33:08,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
> graph.
> 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
> side:chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
> ==
> It would be good if the error message has a line number and a copy of the 
> line in the script which is causing the problem.
> Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-858) Order By followed by "replicated" join fails while compiling MR-plan from physical plan

2009-10-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-858:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Fix checked in.  Thanks Ashutosh for the patch and explanation.

> Order By followed by "replicated" join fails while compiling MR-plan from 
> physical plan
> ---
>
> Key: PIG-858
> URL: https://issues.apache.org/jira/browse/PIG-858
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.6.0
>
> Attachments: pig-858.patch
>
>
> Consider the query:
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0;
> explain C;
> {code}
> works. But if replicated join is used instead
> {code}
> A = load 'a';
> B = order A by $0;
> C = join A by $0, B by $0 using "replicated";
> explain C;
> {code}
> this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error 
> compiling operator POFRJoin
> relevant stacktrace:
> {code}
> Caused by: java.lang.RuntimeException: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
>  ERROR 2034: Error compiling operator POFRJoin
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306)
> at org.apache.pig.PigServer.explain(PigServer.java:574)
> ... 8 more
> Caused by: 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
>  ERROR 2034: Error compiling operator POFRJoin
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303)
> ... 9 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901)
> ... 16 more
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1277#action_1277
 ] 

Hadoop QA commented on PIG-790:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422298/PIG-790-1.patch
  against trunk revision 825712.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/89/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/89/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/89/console

This message is automatically generated.

> Error message should indicate in which line number in the Pig script the 
> error occured (debugging BinCond)
> --
>
> Key: PIG-790
> URL: https://issues.apache.org/jira/browse/PIG-790
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, 
> pig_1240972895275.log
>
>
> I have a simple Pig script which loads integer data and does a Bincond, where 
> it compares, (col1 eq ''). There is an error message that is generated in 
> this case, but it does not specify the line number in the script. 
> {code}
> MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
> col2:int);
> MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
>  ((col1 neq '') ? col1 - col2 : 
> 16)
> as time_diff;
> dump MYDATA_PROJECT;
> {code}
> ==
> 2009-04-29 02:33:07,182 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 2009-04-29 02:33:08,584 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
> graph.
> 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
> side:chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
> ==
> It would be good if the error message has a line number and a copy of the 
> line in the script which is causing the problem.
> Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-16 Thread Kevin Weil (JIRA)
Should be able to set job priority through Pig Latin


 Key: PIG-1025
 URL: https://issues.apache.org/jira/browse/PIG-1025
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.4.0
Reporter: Kevin Weil
Priority: Minor


Currently users can set the job name through Pig Latin by saying

set job.name 'my job name'

The ability to set the priority would also be nice, and the patch should be 
small.  The goal is to be able to say

set job.priority 'high'

and throw a JobCreationException in the JobControlCompiler if the priority is 
not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
very_low, low, normal, high, very_high.   Case insensitivity makes this a 
little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data

2009-10-16 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-953:
---

Attachment: PIG-953-7.patch

I missed allowing an IOException to be thrown in commit() in 
CommittableStoreFunc and initialize() in IndexableLoadFunc in my previous patch 
- attaching new version with just that change.

> Enable merge join in pig to work with loaders and store functions which can 
> internally index sorted data 
> -
>
> Key: PIG-953
> URL: https://issues.apache.org/jira/browse/PIG-953
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, 
> PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953.patch
>
>
> Currently merge join implementation in pig includes construction of an index 
> on sorted data and use of that index to seek into the "right input" to 
> efficiently perform the join operation. Some loaders (notably the zebra 
> loader) internally implement an index on sorted data and can perform this 
> seek efficiently using their index. So the use of the index needs to be 
> abstracted in such a way that when the loader supports indexing, pig uses it 
> (indirectly through the loader) and does not construct an index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-16 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Status: Patch Available  (was: Open)

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch, PIG-984_1.patch, 
> PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-16 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Attachment: PIG-984_1.patch

Fix the compile errors.

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch, PIG-984_1.patch, 
> PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data

2009-10-16 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-953:
---

Attachment: PIG-953-6.patch

Dmitriy - by default when the application does not set an OutputCommitter, 
hadoop uses FileOutputCommitter. So currently (in trunk code) since pig does 
not set an OuptuCommitter, hadoop would be using FileOutputCommitter. Hence I 
derived from FileOutputCommitter so that the current cleanup continues to 
happen and we do the extra commit needed by Zebra.

The new load-store redesign already has an allFinished() method in storeFunc 
which is the same as this commit except it does not have the Configuration - I 
have modified it to have the Configuration parameter.

It turns out zebra needs the job configuration in order to open the right side 
file during merge join. Hence I am introducing an initialize(Configuration 
conf) method into the IndexableLoadFunc interface in the attached patch so that 
the pig runtime can call it allowing zebra to store this configuration for use 
in opening the right side file later.

> Enable merge join in pig to work with loaders and store functions which can 
> internally index sorted data 
> -
>
> Key: PIG-953
> URL: https://issues.apache.org/jira/browse/PIG-953
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, 
> PIG-953-5.patch, PIG-953-6.patch, PIG-953.patch
>
>
> Currently merge join implementation in pig includes construction of an index 
> on sorted data and use of that index to seek into the "right input" to 
> efficiently perform the join operation. Some loaders (notably the zebra 
> loader) internally implement an index on sorted data and can perform this 
> seek efficiently using their index. So the use of the index needs to be 
> abstracted in such a way that when the loader supports indexing, pig uses it 
> (indirectly through the loader) and does not construct an index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1011) FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't define serialVersionUID

2009-10-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1011:


Attachment: PIG-1011.patch

> FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't define 
> serialVersionUID
> ---
>
> Key: PIG-1011
> URL: https://issues.apache.org/jira/browse/PIG-1011
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1011.patch
>
>
> SnVI  
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODistinct
>  is Serializable; consider declaring a SnVI   
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORead
>  is Serializable; consider declaring a serialVersionUID

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1011) FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't define serialVersionUID

2009-10-16 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1011:


Status: Patch Available  (was: Open)

> FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't define 
> serialVersionUID
> ---
>
> Key: PIG-1011
> URL: https://issues.apache.org/jira/browse/PIG-1011
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1011.patch
>
>
> SnVI  
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODistinct
>  is Serializable; consider declaring a SnVI   
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORead
>  is Serializable; consider declaring a serialVersionUID

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-761) ERROR 2086 on simple JOIN

2009-10-16 Thread Sean Timm (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766640#action_12766640
 ] 

Sean Timm commented on PIG-761:
---

I ran into this bug as well.  I am using Pig 0.4.0.  I did a LIMIT on one of 
the data sets to be joined too.  I worked around the problem by approximating 
the LIMIT with a FILTER.  I'll see if I can distill it down to a small 
reproducible test case.  I won't get time to do that for a week or two though.

> ERROR 2086 on simple JOIN
> -
>
> Key: PIG-761
> URL: https://issues.apache.org/jira/browse/PIG-761
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
> Environment: mapreduce mode
>Reporter: Vadim Zaliva
>
> ERROR 2086: Unexpected problem during optimization. Could not find all 
> LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: 
> ERROR 1002: Unable to store alias 109
> doing pretty straightforward join in one of my pig scripts. I am able to 
> 'dump' both relationship involved in this join. when I try to join them I am 
> getting this error.
> Here is a full log:
> ERROR 2086: Unexpected problem during optimization. Could not find all
> LocalRearrange operators.
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
> to store alias 109
>at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
>at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
>at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
>at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
>at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
>at org.apache.pig.Main.main(Main.java:319)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2043: Unexpected error during execution.
>at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274)
>at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700)
>at org.apache.pig.PigServer.execute(PigServer.java:691)
>at org.apache.pig.PigServer.registerQuery(PigServer.java:292)
>... 5 more
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException:
> ERROR 2086: Unexpected problem during optimization. Could not find all
> LocalRearrange operators.
>at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116)
>at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88)
>at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194)
>at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43)
>at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65)
>at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>at 
> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> MapReduceLauncher.compile(MapReduceLauncher.java:198)
>at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80)
>at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
>... 8 more
> ERROR 1002: Unable to store alias 398
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
> to store alias 398
>at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
>at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
>at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
>at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
>at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
>at org.apache.pig.Main.main(Main.java:319)
> Caused by: java.lang.NullPointerException
>at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:669)
>at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:330)
>at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:41)
>at 
> org.apache.pig.imp

[jira] Updated: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-16 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1020:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to both trunk and 0.5 branch.

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.5.0, 0.6.0
>
> Attachments: PIG-1020-1.patch, PIG-1020-2.patch, PIG-1020-3.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-16 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-984:
-

Status: Open  (was: Patch Available)

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch, PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-16 Thread hc busy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766634#action_12766634
 ] 

hc busy commented on PIG-1016:
--

Thanks to everyone who is reviewing this ticket. I really appreciate it!

This feature is important because the data I have is slightly hierarchical 
(maps(string#map(:)) Some times I need to sort by values corresponding to one 
key in the map, while other times I need to merge on a value corresponding to a 
different key of the map.

Aside from the unit tests running, I also performed some join tests from this 
parser. The results are all fine except for the skew join, which produced twice 
as many rows as was right... has anybody else encountered this problem? Or is 
it only a result of taking values from a map?


thanks!

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-16 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Status: Open  (was: Patch Available)

> Multi-query optimization throws ClassCastException
> --
>
> Key: PIG-976
> URL: https://issues.apache.org/jira/browse/PIG-976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ankur
>Assignee: Richard Ding
> Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
> PIG-976.patch, PIG-976.patch
>
>
> Multi-query optimization fails to merge 2 branches when 1 is a result of 
> Group By ALL and another is a result of Group By field1 where field 1 is of 
> type long. Here is the script that fails with multi-query on.
> data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
> A = GROUP data ALL;
> B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
> C = FOREACH B GENERATE (sum1/sum2) AS rate; 
> STORE C INTO 'result1';
> D = GROUP data BY a; 
> E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
> STORE E into 'result2';
>  
> Here is the exception from the logs
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-16 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-976:
-

Status: Patch Available  (was: Open)

> Multi-query optimization throws ClassCastException
> --
>
> Key: PIG-976
> URL: https://issues.apache.org/jira/browse/PIG-976
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Ankur
>Assignee: Richard Ding
> Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
> PIG-976.patch, PIG-976.patch
>
>
> Multi-query optimization fails to merge 2 branches when 1 is a result of 
> Group By ALL and another is a result of Group By field1 where field 1 is of 
> type long. Here is the script that fails with multi-query on.
> data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
> A = GROUP data ALL;
> B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
> C = FOREACH B GENERATE (sum1/sum2) AS rate; 
> STORE C INTO 'result1';
> D = GROUP data BY a; 
> E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
> STORE E into 'result2';
>  
> Here is the exception from the logs
> java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
> to org.apache.pig.data.DataBag
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
>   at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data

2009-10-16 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766622#action_12766622
 ] 

Dmitriy V. Ryaboy commented on PIG-953:
---

Pradeep, it seems like PigOutputCommiter should extend OutputCommitter rather 
than FileOutputCommitter. 
Also -- add this requirement to the StoreFunc redesign proposal?



> Enable merge join in pig to work with loaders and store functions which can 
> internally index sorted data 
> -
>
> Key: PIG-953
> URL: https://issues.apache.org/jira/browse/PIG-953
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, 
> PIG-953-5.patch, PIG-953.patch
>
>
> Currently merge join implementation in pig includes construction of an index 
> on sorted data and use of that index to seek into the "right input" to 
> efficiently perform the join operation. Some loaders (notably the zebra 
> loader) internally implement an index on sorted data and can perform this 
> seek efficiently using their index. So the use of the index needs to be 
> abstracted in such a way that when the loader supports indexing, pig uses it 
> (indirectly through the loader) and does not construct an index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-16 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766613#action_12766613
 ] 

Olga Natkovich commented on PIG-1020:
-

+1

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.5.0, 0.6.0
>
> Attachments: PIG-1020-1.patch, PIG-1020-2.patch, PIG-1020-3.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-993:
---

Status: Open  (was: Patch Available)

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, zebra-drop-cg.patch, 
> zebra-drop-cg.patch, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-993:
---

Attachment: TEST-org.apache.hadoop.zebra.io.TestCheckin.txt

When I run the zebra unit tests for this patch, I get a failure from 
org.apache.hadoop.zebra.io.TestCheckin.txt.  Output of that test attached.

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, 
> TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, zebra-drop-cg.patch, 
> zebra-drop-cg.patch, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table

2009-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766581#action_12766581
 ] 

Hadoop QA commented on PIG-993:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12421881/zebra-drop-cg.patch
  against trunk revision 825712.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 33 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/88/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/88/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/88/console

This message is automatically generated.

> [zebra] Abitlity to drop a column group in a table
> --
>
> Key: PIG-993
> URL: https://issues.apache.org/jira/browse/PIG-993
> Project: Pig
>  Issue Type: Bug
>Reporter: Raghu Angadi
>Assignee: Raghu Angadi
> Fix For: 0.6.0
>
> Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, 
> zebra-drop-cg.patch, zebra-drop-cg.patch
>
>
> A Zebra table is stored as multiple sub tables each containing a set of 
> columns called column group (CG). The user specifies how these columns are 
> grouped while creating a table through the _storage hint_.
> For some of the large tables, it might be necessary for users to remove a set 
> of columns and retain the rest. This jira provides a way for users to delete 
> an entire column group. 
> The following comments will have more details on API and the semantics. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Pig-trunk #591

2009-10-16 Thread Apache Hudson Server
See 




[jira] Commented: (PIG-927) null should be handled consistently in Join

2009-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766500#action_12766500
 ] 

Hadoop QA commented on PIG-927:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422271/PIG-927-2.patch
  against trunk revision 825712.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/87/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/87/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/87/console

This message is automatically generated.

> null should be handled consistently in Join
> ---
>
> Key: PIG-927
> URL: https://issues.apache.org/jira/browse/PIG-927
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-927-1.patch, PIG-927-2.patch
>
>
> Currenlty Pig mostly follows SQL semantics for handling null. However there 
> are certain cases where pig may need to handle nulls correctly. One example 
> is the join - joins on single keys results in null keys not matching to 
> produce an output. However if the join is on >1 keys, in the key tuple, if 
> one of the values is null, it still matches with another key tuple which has 
> a null for that value. We need to decide the right semantics here. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1020) Include an ant target to build pig.jar without hadoop libraries

2009-10-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766451#action_12766451
 ] 

Hadoop QA commented on PIG-1020:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422263/PIG-1020-3.patch
  against trunk revision 825712.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/86/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/86/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/86/console

This message is automatically generated.

> Include an ant target to build pig.jar without hadoop libraries
> ---
>
> Key: PIG-1020
> URL: https://issues.apache.org/jira/browse/PIG-1020
> Project: Pig
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 0.4.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Minor
> Fix For: 0.5.0, 0.6.0
>
> Attachments: PIG-1020-1.patch, PIG-1020-2.patch, PIG-1020-3.patch
>
>
> Provide an ant target to build pig.jar without all hadoop related libraries. 
> User will provide external hadoop jars in classpath before invoking pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-16 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766444#action_12766444
 ] 

Daniel Dai commented on PIG-1016:
-

I think the problem is in current TextDataParser, map is defined as 
String#String, and string exclude special characters such as "(", ")", ",", so 
busy has no way to generate a tuple in the value field of the map. The approach 
busy took looks valid to me.

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.