[jira] Assigned: (PIG-1027) Number of bytes written are always zero in local mode

2009-10-20 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang reassigned PIG-1027:
---

Assignee: Jeff Zhang

 Number of bytes written are always zero in local mode
 -

 Key: PIG-1027
 URL: https://issues.apache.org/jira/browse/PIG-1027
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Ashutosh Chauhan
Assignee: Jeff Zhang
Priority: Minor

 Consider this very simple script containing few records
 {code}
 a = load 'foo';
 store a into 'out';
 {code}
 Following message gets printed on grunt shell:
 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 
 Records written : 39
 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 
 Bytes written : 0
 File has 39 records which is correctly reported. But number of bytes is 
 always reported as zero, no matter what.  I am observing this on latest 
 trunk, not sure if this existed on previous/current releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1029) HBaseStorage is way too slow to be usable

2009-10-20 Thread Vincent BARAT (JIRA)
HBaseStorage is way too slow to be usable
-

 Key: PIG-1029
 URL: https://issues.apache.org/jira/browse/PIG-1029
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Vincent BARAT


I have performed a set of benchmarks on HBaseStorage loader, using PIG 0.4.0 
and HBase 0.20.0 (using the patch referred in 
https://issues.apache.org/jira/browse/PIG-970) and Hadoop 0.20.0.

The HBaseStorage loader is basically 10x slower than the PigStorage loader.

To bypass this limitation, I had to read my HBase tables, write them to a 
Hadoop file and then use this file as input for my subsequent computations.

I report this bug for the track, I will try to sse if I can optimise this a bit.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-trunk #594

2009-10-20 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/594/changes

Changes:

[daijy] PIG-644: Duplicate column names in foreach do not throw parser error

--
[...truncated 2544 lines...]

ivy-init-dirs:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: 
org.apache.pig#Pig;2009-10-20_10-05-51
[ivy:resolve]   confs: [buildJar]
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve]   found jline#jline;0.9.94 in maven2
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve]   found junit#junit;4.5 in default
[ivy:resolve] :: resolution report :: resolve 68ms :: artifacts dl 4ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
| buildJar |   4   |   0   |   0   |   0   ||   4   |   0   |
-
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve]  confs: [buildJar]
[ivy:retrieve]  1 artifacts copied, 3 already retrieved (288kB/5ms)

buildJar:
 [echo] svnString 827023
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/pig-2009-10-20_10-05-51.jar
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk

jarWithOutSvn:

findbugs:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs
 [findbugs] Executing findbugs from ant task
 [findbugs] Running FindBugs...
 [findbugs] The following classes needed for analysis were missing:
 [findbugs]   com.jcraft.jsch.SocketFactory
 [findbugs]   com.jcraft.jsch.Logger
 [findbugs]   jline.Completor
 [findbugs]   com.jcraft.jsch.Session
 [findbugs]   com.jcraft.jsch.HostKeyRepository
 [findbugs]   com.jcraft.jsch.JSch
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   jline.ConsoleReaderInputStream
 [findbugs]   com.jcraft.jsch.HostKey
 [findbugs]   jline.ConsoleReader
 [findbugs]   com.jcraft.jsch.ChannelExec
 [findbugs]   jline.History
 [findbugs]   com.jcraft.jsch.ChannelDirectTCPIP
 [findbugs]   com.jcraft.jsch.JSchException
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs] Warnings generated: 387
 [findbugs] Missing classes: 16
 [findbugs] Calculating exit code...
 [findbugs] Setting 'missing class' flag (2)
 [findbugs] Setting 'bugs found' flag (1)
 [findbugs] Exit code set to: 3
 [findbugs] Java Result: 3
 [findbugs] Classes needed for analysis were missing
 [findbugs] Output saved to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml
 [xslt] Processing 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml
 to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.html
 [xslt] Loading stylesheet 
/homes/gkesavan/tools/findbugs/latest/src/xsl/default.xsl

BUILD SUCCESSFUL
Total time: 2 minutes 47 seconds
+ mv build/pig-2009-10-20_10-05-51.tar.gz 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ mv build/test/findbugs 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ mv build/docs/api 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant clean
Buildfile: build.xml

clean:
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src-gen
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src/docs/build
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/test/org/apache/pig/test/utils/dotGraph/parser

BUILD SUCCESSFUL
Total time: 0 seconds
+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant 
-Dtest.junit.output.format=xml -Dtest.output=yes 
-Dcheckstyle.home=/homes/hudson/tools/checkstyle/latest -Drun.clover=true 
-Dclover.home=/homes/hudson/tools/clover/clover-ant-2.3.2 clover test 
generate-clover-reports
Buildfile: build.xml

clover.setup:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db
[clover-setup] Clover Version 2.3.2, built on July 15 2008 (build-732)
[clover-setup] Loaded from: 
/homes/hudson/tools/clover/clover-ant-2.3.2/lib/clover.jar
[clover-setup] Clover: Open Source License registered to Apache Software 
Foundation.
[clover-setup] Clover is enabled with initstring 
'http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db/pig_coverage.db'

clover.info:

clover:

test:

ivy-download:
  [get] 

[jira] Updated: (PIG-1027) Number of bytes written are always zero in local mode

2009-10-20 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-1027:


Attachment: Pig_1027.Patch

The cause of this bug is because of the path problem.  the file name in 
FileSpec has the schema.
When we create a new file, we should remove the schema.

 Number of bytes written are always zero in local mode
 -

 Key: PIG-1027
 URL: https://issues.apache.org/jira/browse/PIG-1027
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Ashutosh Chauhan
Assignee: Jeff Zhang
Priority: Minor
 Attachments: Pig_1027.Patch


 Consider this very simple script containing few records
 {code}
 a = load 'foo';
 store a into 'out';
 {code}
 Following message gets printed on grunt shell:
 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 
 Records written : 39
 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 
 Bytes written : 0
 File has 39 records which is correctly reported. But number of bytes is 
 always reported as zero, no matter what.  I am observing this on latest 
 trunk, not sure if this existed on previous/current releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1027) Number of bytes written are always zero in local mode

2009-10-20 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-1027:


Attachment: (was: Pig_1027.Patch)

 Number of bytes written are always zero in local mode
 -

 Key: PIG-1027
 URL: https://issues.apache.org/jira/browse/PIG-1027
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Ashutosh Chauhan
Assignee: Jeff Zhang
Priority: Minor
 Attachments: Pig_1027.Patch


 Consider this very simple script containing few records
 {code}
 a = load 'foo';
 store a into 'out';
 {code}
 Following message gets printed on grunt shell:
 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 
 Records written : 39
 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 
 Bytes written : 0
 File has 39 records which is correctly reported. But number of bytes is 
 always reported as zero, no matter what.  I am observing this on latest 
 trunk, not sure if this existed on previous/current releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1027) Number of bytes written are always zero in local mode

2009-10-20 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-1027:


Attachment: Pig_1027.Patch

 Number of bytes written are always zero in local mode
 -

 Key: PIG-1027
 URL: https://issues.apache.org/jira/browse/PIG-1027
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Ashutosh Chauhan
Assignee: Jeff Zhang
Priority: Minor
 Attachments: Pig_1027.Patch


 Consider this very simple script containing few records
 {code}
 a = load 'foo';
 store a into 'out';
 {code}
 Following message gets printed on grunt shell:
 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 
 Records written : 39
 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 
 Bytes written : 0
 File has 39 records which is correctly reported. But number of bytes is 
 always reported as zero, no matter what.  I am observing this on latest 
 trunk, not sure if this existed on previous/current releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767862#action_12767862
 ] 

Alan Gates commented on PIG-760:


I don't take javac or findbugs warnings as final truth.  If you can give a good 
reason why the warning is wrong, not relevant, or you've chosen to take that 
risk to get some other benefit (such as you're not doing instanceof before a 
cast for performance and you believe the risk acceptable) then put that in 
comments and suppress the warning in the code.

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
 Attachments: pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Complimentary Guest Invitation: Only 3 Weeks to go! Limited Places Remaining!

2009-10-20 Thread Innovation Across Europe 2009
SciTech Europe 2009: Innovation Across Europe: 
http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH4-0/c.aspx

The Square, Brussels
12th November 2009

Complimentary Guest Invitation: 
http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH1-0/c.aspx 

Dear Colleague,

We would like to personally invite you as our guest to SciTech Europe 2009: 
Innovation Across Europe, the event that will give you the opportunity to 
network and debate with academic experts and industry stakeholders as well as 
assist in outlining a collaborative approach to European science and technology 
frameworks.

As your organisation as yet does not have any representation at the day, we do 
not want you to miss out on what will be an extremely important event, and with 
only limited complimentary places remaining, avoid disappointment and REGISTER 
YOUR COMPLIMENTARY PLACE AT SCITECH EUROPE 2009: INNOVATION ACROSS EUROPE, 
where you will hear from some of the world's leading experts and most esteemed 
speakers involved in creating the policies and initiatives that will drive this 
important issue forward: 
http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH1-0/c.aspx 

Dr Roland Schenkel, Director-General of the Joint Research Centre, European 
Commission
High-tech research - importance and challenges for policy-making

Andreu Mas-Colell, Secretary-General, European Research Council
Making science investments with greater confidence
Ensuring that Europe is a centre for world-leading knowledge-based research 
innovation. What challenges must be addressed in order to better support the 
needs of science in Europe? Looking at the long-term vision, quality education 
and the role of the European Research Council in facilitating a better research 
environment for Europe. 

Professor Marja Makarow, Chief Executive, European Science Foundation
Collaborative European research
Considering the challenges of streamlining research and ensuring that the needs 
of society are better aligned to research outcomes what are the main priorities 
for Europe in terms of building on strengths in science and research and 
creating the best research conditions for addressing the grand challenges such 
as climate change, disease and ageing populations?

Professor Dominique Foray, Chairman of Knowledge for Growth Group
The role of the European Research Area
How can Europe better exploit factors of productivity and stimulate the levels 
of entrepreneurship to ensure that the right specialisations are made in the 
right regions of Europe? How do we overcome both the scientific and the 
political challenges in order to achieve the key objectives set out in 
enhancing innovation and ultimately the success of science and research in 
Europe?

Asger Kej Chief Executive Officer DHI Group (Approved Technological Service 
Institute) 
International knowledge dissemination and technology-based innovations
Innovation is a core strategy in coping with the global climate challenges 
facing society. In the Danish national innovations system the nine approved 
technological service institutes (GTS) are core innovation drivers and 
cutting-edge knowledge disseminators. For many years the institutes have 
focused on clean-tech and sustainable energy in stimulating SME innovation. 
This has been a successful strategy but the challenges we are facing now call 
for broadened international perspective and cooperation.

Details of the other esteemed speakers, topics and more can be found HERE: 
http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH4-0/c.aspx

The new Square conference centre is in the heart of Brussels and has an 
abundance of hotel rooms in easy walking distance. The train station is 
attached to the facility so everything is literally on the doorstep: 
http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH3-0/c.aspx 

More details about hotels and travel can be found here: 
http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH2-0/c.aspx 

CONFIRM YOUR COMPLIMENTARY PLACE AT THIS LEADING EVENT TODAY: 
http://publicserviceevents.org.uk/4HD-28I8-10AI2A-1BLH1-0/c.aspx 

If you are unable to attend this event, please feel free to forward details of 
the event to a colleague

If you have any queries please don't hesitate to contact Matthew Warrilow or 
telephone +44 (0)161 832 7387

Please note that sponsors of this event may contact registered delegates 
post-event with regards to their services. Please inform us if you do not wish 
your details to be passed on.

This complimentary invitation is only open to new delegates.

PSCA International Ltd
City Wharf
New Bailey street
Manchester
Lancashire
England
M3 5ER 

T: + 44 (0)161 832 7387 
F: + 44 (0)161 832 7396

Registered in England
Co. Reg No. 4521155
Vat Reg No. 902 1814 62

Want to unsubscribe or change your details? 
http://publicserviceevents.org.uk/4HD-28I8-CA10AI2A4C/uns.aspx


[jira] Assigned: (PIG-760) Serialize schemas for PigStorage() and other storage types.

2009-10-20 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy reassigned PIG-760:
-

Assignee: Dmitriy V. Ryaboy

 Serialize schemas for PigStorage() and other storage types.
 ---

 Key: PIG-760
 URL: https://issues.apache.org/jira/browse/PIG-760
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Dmitriy V. Ryaboy
 Attachments: pigstorageschema.patch


 I'm finding PigStorage() really convenient for storage and data interchange 
 because it compresses well and imports into Excel and other analysis 
 environments well.
 However, it is a pain when it comes to maintenance because the columns are in 
 fixed locations and I'd like to add columns in some cases.
 It would be great if load PigStorage() could read a default schema from a 
 .schema file stored with the data and if store PigStorage() could store a 
 .schema file with the data.
 I have tested this out and both Hadoop HDFS and Pig in -exectype local mode 
 will ignore a file called .schema in a directory of part files.
 So, for example, if I have a chain of Pig scripts I execute such as:
 A = load 'data-1' using PigStorage() as ( a: int , b: int );
 store A into 'data-2' using PigStorage();
 B = load 'data-2' using PigStorage();
 describe B;
 describe B should output something like { a: int, b: int }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-996) [zebra] Zebra build script does not have findbugs and clover targets.

2009-10-20 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767905#action_12767905
 ] 

Jing Huang commented on PIG-996:


+1
New patch reviewed.

 [zebra] Zebra build script does not have findbugs and clover targets.
 -

 Key: PIG-996
 URL: https://issues.apache.org/jira/browse/PIG-996
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.4.0
Reporter: Chao Wang
Assignee: Chao Wang
 Fix For: 0.6.0

 Attachments: patch_build, patch_build


 Zebra build script does not have findbugs and clover targets, leading hudson 
 build process to fail on Zebra.
 This jira is to fix this by adding these two targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1030) explain and dump not working with two UDFs inside inner plan of foreach

2009-10-20 Thread Ying He (JIRA)
explain and dump not working with two UDFs inside inner plan of foreach
---

 Key: PIG-1030
 URL: https://issues.apache.org/jira/browse/PIG-1030
 Project: Pig
  Issue Type: Bug
Reporter: Ying He


this scprit does not work

register /homes/yinghe/owl/string.jar;
a = load '/user/yinghe/a.txt' as (id, color);
b = group a all;
c = foreach b {
d = distinct a.color;
generate group, string.BagCount2(d), string.ColumnLen2(d, 0);
}

the udfs are regular, not algebraic.

then if I call  dump c; or explain c, I would get  this error message.
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2019: Expected to find plan with 
single leaf. Found 2 leaves.

The error only occurs forn the first time, after getting this error, if I call 
dump c or explain c again, it would succeed.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1030) explain and dump not working with two UDFs inside inner plan of foreach

2009-10-20 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-1030:
-

Description: 
this scprit does not work

register /homes/yinghe/owl/string.jar;
a = load '/user/yinghe/a.txt' as (id, color);
b = group a all;
c = foreach b {
d = distinct a.color;
generate group, string.BagCount2(d), string.ColumnLen2(d, 0);
}

the udfs are regular, not algebraic.

then if I call  dump c; or explain c, I would get  this error message.
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2019: Expected to find plan with 
single leaf. Found 2 leaves.

The error only occurs for the first time, after getting this error, if I call 
dump c or explain c again, it would succeed.




  was:
this scprit does not work

register /homes/yinghe/owl/string.jar;
a = load '/user/yinghe/a.txt' as (id, color);
b = group a all;
c = foreach b {
d = distinct a.color;
generate group, string.BagCount2(d), string.ColumnLen2(d, 0);
}

the udfs are regular, not algebraic.

then if I call  dump c; or explain c, I would get  this error message.
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2019: Expected to find plan with 
single leaf. Found 2 leaves.

The error only occurs forn the first time, after getting this error, if I call 
dump c or explain c again, it would succeed.





 explain and dump not working with two UDFs inside inner plan of foreach
 ---

 Key: PIG-1030
 URL: https://issues.apache.org/jira/browse/PIG-1030
 Project: Pig
  Issue Type: Bug
Reporter: Ying He

 this scprit does not work
 register /homes/yinghe/owl/string.jar;
 a = load '/user/yinghe/a.txt' as (id, color);
 b = group a all;
 c = foreach b {
 d = distinct a.color;
 generate group, string.BagCount2(d), string.ColumnLen2(d, 0);
 }
 the udfs are regular, not algebraic.
 then if I call  dump c; or explain c, I would get  this error message.
 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2019: Expected to find plan 
 with single leaf. Found 2 leaves.
 The error only occurs for the first time, after getting this error, if I call 
 dump c or explain c again, it would succeed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException

2009-10-20 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-976:
---

   Resolution: Fixed
Fix Version/s: 0.6.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Patch committed to trunk - Thanks Richard!

 Multi-query optimization throws ClassCastException
 --

 Key: PIG-976
 URL: https://issues.apache.org/jira/browse/PIG-976
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Ankur
Assignee: Richard Ding
 Fix For: 0.6.0

 Attachments: PIG-976.patch, PIG-976.patch, PIG-976.patch, 
 PIG-976.patch, PIG-976.patch


 Multi-query optimization fails to merge 2 branches when 1 is a result of 
 Group By ALL and another is a result of Group By field1 where field 1 is of 
 type long. Here is the script that fails with multi-query on.
 data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); 
 A = GROUP data ALL;
 B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2;
 C = FOREACH B GENERATE (sum1/sum2) AS rate; 
 STORE C INTO 'result1';
 D = GROUP data BY a; 
 E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c);
 STORE E into 'result2';
  
 Here is the exception from the logs
 java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast 
 to org.apache.pig.data.DataBag
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1017) Converts strings to text in Pig

2009-10-20 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767952#action_12767952
 ] 

Sriranjan Manjunath commented on PIG-1017:
--

The release audit warnings are related to html files.

 Converts strings to text in Pig
 ---

 Key: PIG-1017
 URL: https://issues.apache.org/jira/browse/PIG-1017
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: stotext.patch


 Strings in Java are UTF-16 and takes 2 bytes. Text 
 (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
 significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-20 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1025:


Status: Open  (was: Patch Available)

This causes a number of unit test failures.  It seems that some reference in 
the configuration object is being set to null.  If you run 'ant test-commit' 
you'll see failures in TestMultiqueryLocal.  These same failures are showing up 
in a number of the tests.

 Should be able to set job priority through Pig Latin
 

 Key: PIG-1025
 URL: https://issues.apache.org/jira/browse/PIG-1025
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.4.0
Reporter: Kevin Weil
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1025.patch


 Currently users can set the job name through Pig Latin by saying
 set job.name 'my job name'
 The ability to set the priority would also be nice, and the patch should be 
 small.  The goal is to be able to say
 set job.priority 'high'
 and throw a JobCreationException in the JobControlCompiler if the priority is 
 not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
 very_low, low, normal, high, very_high.   Case insensitivity makes this a 
 little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1026) [zebra] map split returns null

2009-10-20 Thread Jing Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767983#action_12767983
 ] 

Jing Huang commented on PIG-1026:
-

Created a customer scenario with this schema and storage hint: 
(TestJira1026.java)

 final static String STR_SCHEMA = bcookie:bytes,yuid:bytes, 
ip:bytes,query_term:bytes,clickinfo:map(String),demog:map(String),page_params:map(String),viewinfo:collection(f1:map(String));
   

 final static String STR_STORAGE = 
[bcookie,yuid,ip,query_term];[clickinfo#{pos|sec|slk|targurl|cost|gpos},page_params#{ipc|vtestid|frcode|pagenum|query}];[clickinfo,page_params,demog];[viewinfo];
 
Got NullPointExcepiton.

 [zebra] map split returns null
 --

 Key: PIG-1026
 URL: https://issues.apache.org/jira/browse/PIG-1026
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0

 Attachments: MultipleKeyInMapSplitException.patch


 Here is the test scenario:
  final static String STR_SCHEMA = m1:map(string),m2:map(map(int));
   //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1];
  final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1,m2];
 projection: String projection2 = new String(m1#{b}, m2#{x|z});
 User got null pointer exception on reading m1#{b}.
 Yan, please refer to the test class:
 TestNonDefaultWholeMapSplit.java 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1027) Number of bytes written are always zero in local mode

2009-10-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767984#action_12767984
 ] 

Hadoop QA commented on PIG-1027:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422692/Pig_1027.Patch
  against trunk revision 826927.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/103/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/103/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/103/console

This message is automatically generated.

 Number of bytes written are always zero in local mode
 -

 Key: PIG-1027
 URL: https://issues.apache.org/jira/browse/PIG-1027
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Ashutosh Chauhan
Assignee: Jeff Zhang
Priority: Minor
 Attachments: Pig_1027.Patch


 Consider this very simple script containing few records
 {code}
 a = load 'foo';
 store a into 'out';
 {code}
 Following message gets printed on grunt shell:
 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 
 Records written : 39
 [main] INFO  org.apache.pig.backend.local.executionengine.LocalPigLauncher - 
 Bytes written : 0
 File has 39 records which is correctly reported. But number of bytes is 
 always reported as zero, no matter what.  I am observing this on latest 
 trunk, not sure if this existed on previous/current releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-trunk #595

2009-10-20 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/595/changes

Changes:

[pradeepkth] PIG-976: Multi-query optimization throws ClassCastException (rding 
via pradeepkth)

--
[...truncated 2558 lines...]

ivy-init-dirs:

ivy-probe-antlib:

ivy-init-antlib:

ivy-init:

ivy-buildJar:
[ivy:resolve] :: resolving dependencies :: 
org.apache.pig#Pig;2009-10-20_22-34-59
[ivy:resolve]   confs: [buildJar]
[ivy:resolve]   found com.jcraft#jsch;0.1.38 in maven2
[ivy:resolve]   found jline#jline;0.9.94 in maven2
[ivy:resolve]   found net.java.dev.javacc#javacc;4.2 in maven2
[ivy:resolve]   found junit#junit;4.5 in default
[ivy:resolve] :: resolution report :: resolve 53ms :: artifacts dl 4ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
| buildJar |   4   |   0   |   0   |   0   ||   4   |   0   |
-
[ivy:retrieve] :: retrieving :: org.apache.pig#Pig
[ivy:retrieve]  confs: [buildJar]
[ivy:retrieve]  1 artifacts copied, 3 already retrieved (288kB/4ms)

buildJar:
 [echo] svnString 827825
  [jar] Building jar: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/pig-2009-10-20_22-34-59.jar
 [copy] Copying 1 file to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk

jarWithOutSvn:

findbugs:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs
 [findbugs] Executing findbugs from ant task
 [findbugs] Running FindBugs...
 [findbugs] The following classes needed for analysis were missing:
 [findbugs]   com.jcraft.jsch.SocketFactory
 [findbugs]   com.jcraft.jsch.Logger
 [findbugs]   jline.Completor
 [findbugs]   com.jcraft.jsch.Session
 [findbugs]   com.jcraft.jsch.HostKeyRepository
 [findbugs]   com.jcraft.jsch.JSch
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   jline.ConsoleReaderInputStream
 [findbugs]   com.jcraft.jsch.HostKey
 [findbugs]   jline.ConsoleReader
 [findbugs]   com.jcraft.jsch.ChannelExec
 [findbugs]   jline.History
 [findbugs]   com.jcraft.jsch.ChannelDirectTCPIP
 [findbugs]   com.jcraft.jsch.JSchException
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs] Warnings generated: 386
 [findbugs] Missing classes: 16
 [findbugs] Calculating exit code...
 [findbugs] Setting 'missing class' flag (2)
 [findbugs] Setting 'bugs found' flag (1)
 [findbugs] Exit code set to: 3
 [findbugs] Java Result: 3
 [findbugs] Classes needed for analysis were missing
 [findbugs] Output saved to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml
 [xslt] Processing 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.xml
 to 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/findbugs/pig-findbugs-report.html
 [xslt] Loading stylesheet 
/homes/gkesavan/tools/findbugs/latest/src/xsl/default.xsl

BUILD SUCCESSFUL
Total time: 2 minutes 44 seconds
+ mv build/pig-2009-10-20_22-34-59.tar.gz 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ mv build/test/findbugs 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ mv build/docs/api 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk
+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant clean
Buildfile: build.xml

clean:
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src-gen
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/src/docs/build
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build
   [delete] Deleting directory 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/test/org/apache/pig/test/utils/dotGraph/parser

BUILD SUCCESSFUL
Total time: 0 seconds
+ /homes/hudson/tools/ant/apache-ant-1.7.0/bin/ant 
-Dtest.junit.output.format=xml -Dtest.output=yes 
-Dcheckstyle.home=/homes/hudson/tools/checkstyle/latest -Drun.clover=true 
-Dclover.home=/homes/hudson/tools/clover/clover-ant-2.3.2 clover test 
generate-clover-reports
Buildfile: build.xml

clover.setup:
[mkdir] Created dir: 
http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db
[clover-setup] Clover Version 2.3.2, built on July 15 2008 (build-732)
[clover-setup] Loaded from: 
/homes/hudson/tools/clover/clover-ant-2.3.2/lib/clover.jar
[clover-setup] Clover: Open Source License registered to Apache Software 
Foundation.
[clover-setup] Clover is enabled with initstring 
'http://hudson.zones.apache.org/hudson/job/Pig-trunk/ws/trunk/build/test/clover/db/pig_coverage.db'

clover.info:

clover:

test:

ivy-download:

[jira] Commented: (PIG-1012) FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in serializable class

2009-10-20 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767991#action_12767991
 ] 

Daniel Dai commented on PIG-1012:
-

+1, target findbugs warnings suppressed. 

 FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in 
 serializable class
 ---

 Key: PIG-1012
 URL: https://issues.apache.org/jira/browse/PIG-1012
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
 Attachments: PIG-1012.patch


 SeClass org.apache.pig.backend.executionengine.PigSlice defines 
 non-transient non-serializable instance field is
 SeClass org.apache.pig.backend.executionengine.PigSlice defines 
 non-transient non-serializable instance field loader
 Sejava.util.zip.GZIPInputStream stored into non-transient field 
 PigSlice.is
 Seorg.apache.pig.backend.datastorage.SeekableInputStream stored into 
 non-transient field PigSlice.is
 Seorg.apache.tools.bzip2r.CBZip2InputStream stored into non-transient 
 field PigSlice.is
 Seorg.apache.pig.builtin.PigStorage stored into non-transient field 
 PigSlice.loader
 Seorg.apache.pig.backend.hadoop.DoubleWritable$Comparator implements 
 Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigBagWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigCharArrayWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigDBAWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigDoubleWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigFloatWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigIntWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigLongWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigTupleWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigWritableComparator
  implements Comparator but not Serializable
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper 
 defines non-transient non-serializable instance field nig
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.LessThanExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.LTOrEqualToExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.NotEqualToExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
  defines non-transient non-serializable instance field bagIterator
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserComparisonFunc
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POCombinerPackage
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux
  

[jira] Updated: (PIG-1012) FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in serializable class

2009-10-20 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1012:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed

 FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in 
 serializable class
 ---

 Key: PIG-1012
 URL: https://issues.apache.org/jira/browse/PIG-1012
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
 Attachments: PIG-1012.patch


 SeClass org.apache.pig.backend.executionengine.PigSlice defines 
 non-transient non-serializable instance field is
 SeClass org.apache.pig.backend.executionengine.PigSlice defines 
 non-transient non-serializable instance field loader
 Sejava.util.zip.GZIPInputStream stored into non-transient field 
 PigSlice.is
 Seorg.apache.pig.backend.datastorage.SeekableInputStream stored into 
 non-transient field PigSlice.is
 Seorg.apache.tools.bzip2r.CBZip2InputStream stored into non-transient 
 field PigSlice.is
 Seorg.apache.pig.builtin.PigStorage stored into non-transient field 
 PigSlice.loader
 Seorg.apache.pig.backend.hadoop.DoubleWritable$Comparator implements 
 Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigBagWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigCharArrayWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigDBAWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigDoubleWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigFloatWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigIntWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigLongWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigTupleWritableComparator
  implements Comparator but not Serializable
 Se
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigWritableComparator
  implements Comparator but not Serializable
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper 
 defines non-transient non-serializable instance field nig
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.LessThanExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.LTOrEqualToExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.NotEqualToExpr
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
  defines non-transient non-serializable instance field bagIterator
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserComparisonFunc
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POCombinerPackage
  defines non-transient non-serializable instance field log
 SeClass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux
  

[jira] Created: (PIG-1031) PigStorage interpreting chararray/bytearray for a tuple element inside a bag as float or double

2009-10-20 Thread Viraj Bhat (JIRA)
PigStorage interpreting chararray/bytearray for a tuple element inside a bag as 
float or double
---

 Key: PIG-1031
 URL: https://issues.apache.org/jira/browse/PIG-1031
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.5.0
Reporter: Viraj Bhat
 Fix For: 0.5.0, 0.6.0


I have a data stored in a text file as:

{(4153E765)}
{(AF533765)}

I try reading it using PigStorage as:
{code}
A = load 'pigstoragebroken.dat' using PigStorage() as 
(intersectionBag:bag{T:tuple(term:bytearray)});
dump A;
{code}

I get the following results:

{code}
({(Infinity)})
({(AF533765)})
{code}

The problem seems to be with the method: parseFromBytes(byte[] b) in class 
Utf8StorageConverter. This method uses the TextDataParser (class generated via 
jjt) to interpret the type of data from content, even though the schema tells 
it is a bytearray. 

TextDataParser.jjt  sample code
{code}
TOKEN :
{
...
  DOUBLENUMBER: ([-,+])? FLOATINGPOINT ( [e,E] ([ -,+])? 
FLOATINGPOINT )?
  FLOATNUMBER: DOUBLENUMBER ([f,F])? 
...
}
{code}

I tried the following options, but it will not work as we need to call 
bytesToBag(byte[] b) in the Utf8StorageConverter class.
{code}
A = load 'pigstoragebroken.dat' using PigStorage() as 
(intersectionBag:bag{T:tuple(term)});
A = load 'pigstoragebroken.dat' using PigStorage() as 
(intersectionBag:bag{T:tuple(term:chararray)});
{code}


Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1031) PigStorage interpreting chararray/bytearray for a tuple element inside a bag as float or double

2009-10-20 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-1031:


Description: 
I have a data stored in a text file as:

{(4153E765)}
{(AF533765)}


I try reading it using PigStorage as:

{code}
A = load 'pigstoragebroken.dat' using PigStorage() as 
(intersectionBag:bag{T:tuple(term:bytearray)});
dump A;
{code}

I get the following results:


({(Infinity)})
({(AF533765)})


The problem seems to be with the method: parseFromBytes(byte[] b) in class 
Utf8StorageConverter. This method uses the TextDataParser (class generated via 
jjt) to interpret the type of data from content, even though the schema tells 
it is a bytearray. 

TextDataParser.jjt  sample code
{code}
TOKEN :
{
...
  DOUBLENUMBER: ([-,+])? FLOATINGPOINT ( [e,E] ([ -,+])? 
FLOATINGPOINT )?
  FLOATNUMBER: DOUBLENUMBER ([f,F])? 
...
}
{code}

I tried the following options, but it will not work as we need to call 
bytesToBag(byte[] b) in the Utf8StorageConverter class.
{code}
A = load 'pigstoragebroken.dat' using PigStorage() as 
(intersectionBag:bag{T:tuple(term)});
A = load 'pigstoragebroken.dat' using PigStorage() as 
(intersectionBag:bag{T:tuple(term:chararray)});
{code}


Viraj

  was:
I have a data stored in a text file as:

{(4153E765)}
{(AF533765)}

I try reading it using PigStorage as:
{code}
A = load 'pigstoragebroken.dat' using PigStorage() as 
(intersectionBag:bag{T:tuple(term:bytearray)});
dump A;
{code}

I get the following results:

{code}
({(Infinity)})
({(AF533765)})
{code}

The problem seems to be with the method: parseFromBytes(byte[] b) in class 
Utf8StorageConverter. This method uses the TextDataParser (class generated via 
jjt) to interpret the type of data from content, even though the schema tells 
it is a bytearray. 

TextDataParser.jjt  sample code
{code}
TOKEN :
{
...
  DOUBLENUMBER: ([-,+])? FLOATINGPOINT ( [e,E] ([ -,+])? 
FLOATINGPOINT )?
  FLOATNUMBER: DOUBLENUMBER ([f,F])? 
...
}
{code}

I tried the following options, but it will not work as we need to call 
bytesToBag(byte[] b) in the Utf8StorageConverter class.
{code}
A = load 'pigstoragebroken.dat' using PigStorage() as 
(intersectionBag:bag{T:tuple(term)});
A = load 'pigstoragebroken.dat' using PigStorage() as 
(intersectionBag:bag{T:tuple(term:chararray)});
{code}


Viraj


 PigStorage interpreting chararray/bytearray for a tuple element inside a bag 
 as float or double
 ---

 Key: PIG-1031
 URL: https://issues.apache.org/jira/browse/PIG-1031
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.5.0
Reporter: Viraj Bhat
 Fix For: 0.5.0, 0.6.0


 I have a data stored in a text file as:
 {(4153E765)}
 {(AF533765)}
 I try reading it using PigStorage as:
 {code}
 A = load 'pigstoragebroken.dat' using PigStorage() as 
 (intersectionBag:bag{T:tuple(term:bytearray)});
 dump A;
 {code}
 I get the following results:
 ({(Infinity)})
 ({(AF533765)})
 The problem seems to be with the method: parseFromBytes(byte[] b) in class 
 Utf8StorageConverter. This method uses the TextDataParser (class generated 
 via jjt) to interpret the type of data from content, even though the schema 
 tells it is a bytearray. 
 TextDataParser.jjt  sample code
 {code}
 TOKEN :
 {
 ...
   DOUBLENUMBER: ([-,+])? FLOATINGPOINT ( [e,E] ([ -,+])? 
 FLOATINGPOINT )?
   FLOATNUMBER: DOUBLENUMBER ([f,F])? 
 ...
 }
 {code}
 I tried the following options, but it will not work as we need to call 
 bytesToBag(byte[] b) in the Utf8StorageConverter class.
 {code}
 A = load 'pigstoragebroken.dat' using PigStorage() as 
 (intersectionBag:bag{T:tuple(term)});
 A = load 'pigstoragebroken.dat' using PigStorage() as 
 (intersectionBag:bag{T:tuple(term:chararray)});
 {code}
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768007#action_12768007
 ] 

Alan Gates commented on PIG-790:


+1

 Error message should indicate in which line number in the Pig script the 
 error occured (debugging BinCond)
 --

 Key: PIG-790
 URL: https://issues.apache.org/jira/browse/PIG-790
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.6.0

 Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, 
 pig_1240972895275.log


 I have a simple Pig script which loads integer data and does a Bincond, where 
 it compares, (col1 eq ''). There is an error message that is generated in 
 this case, but it does not specify the line number in the script. 
 {code}
 MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
 col2:int);
 MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
  ((col1 neq '') ? col1 - col2 : 
 16)
 as time_diff;
 dump MYDATA_PROJECT;
 {code}
 ==
 2009-04-29 02:33:07,182 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localhost:9000
 2009-04-29 02:33:08,584 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localhost:9001
 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
 graph.
 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
 side:chararray
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
 ==
 It would be good if the error message has a line number and a copy of the 
 line in the script which is causing the problem.
 Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-927) null should be handled consistently in Join

2009-10-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768010#action_12768010
 ] 

Alan Gates commented on PIG-927:


The new test doesn't seem to test this case.  Other than that the code looks 
good.  Nice comments too, made it easier to understand what was going on.

 null should be handled consistently in Join
 ---

 Key: PIG-927
 URL: https://issues.apache.org/jira/browse/PIG-927
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Pradeep Kamath
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-927-1.patch, PIG-927-2.patch


 Currenlty Pig mostly follows SQL semantics for handling null. However there 
 are certain cases where pig may need to handle nulls correctly. One example 
 is the join - joins on single keys results in null keys not matching to 
 produce an output. However if the join is on 1 keys, in the key tuple, if 
 one of the values is null, it still matches with another key tuple which has 
 a null for that value. We need to decide the right semantics here. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-20 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768018#action_12768018
 ] 

Dmitriy V. Ryaboy commented on PIG-790:
---

This bit of code is repeated almost a dozen times:

code
String alias = currentAlias;
if (binOp.getAlias()!=null)
alias = binOp.getAlias();
String msg = In alias  + alias + , ;
/code

This class is already clocking in at over 2500 lines..

Make it a helper function, shrink the class a bit?

 Error message should indicate in which line number in the Pig script the 
 error occured (debugging BinCond)
 --

 Key: PIG-790
 URL: https://issues.apache.org/jira/browse/PIG-790
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.6.0

 Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, 
 pig_1240972895275.log


 I have a simple Pig script which loads integer data and does a Bincond, where 
 it compares, (col1 eq ''). There is an error message that is generated in 
 this case, but it does not specify the line number in the script. 
 {code}
 MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
 col2:int);
 MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
  ((col1 neq '') ? col1 - col2 : 
 16)
 as time_diff;
 dump MYDATA_PROJECT;
 {code}
 ==
 2009-04-29 02:33:07,182 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localhost:9000
 2009-04-29 02:33:08,584 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localhost:9001
 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
 graph.
 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
 side:chararray
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
 ==
 It would be good if the error message has a line number and a copy of the 
 line in the script which is causing the problem.
 Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1032) FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) constructor

2009-10-20 Thread Olga Natkovich (JIRA)
FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new String(String) 
constructor
---

 Key: PIG-1032
 URL: https://issues.apache.org/jira/browse/PIG-1032
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich


Dm  Method 
org.apache.pig.backend.executionengine.PigSlice.init(DataStorage) invokes 
toString() method on a String
Dm  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.copyHadoopConfLocally(String)
 invokes inefficient new String(String) constructor
Dm  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getFirstLineFromMessage(String)
 invokes inefficient new String(String) constructor
Dm  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.BinaryComparisonOperator.initializeRefs()
 invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
Dm  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.ExpressionOperator.clone()
 invokes inefficient new String(String) constructor
Dm  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(String)
 invokes inefficient new String(String) constructor
Dm  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PONot.getNext(Boolean)
 invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
Dm  
org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone()
 invokes inefficient new String(String) constructor
Dm  new org.apache.pig.data.TimestampedTuple(String, String, int, 
SimpleDateFormat) invokes inefficient new String(String) constructor
Dm  org.apache.pig.impl.io.PigNullableWritable.toString() invokes 
inefficient new String(String) constructor
Dm  org.apache.pig.impl.logicalLayer.LOForEach.clone() invokes inefficient 
Boolean constructor; use Boolean.valueOf(...) instead
Dm  org.apache.pig.impl.logicalLayer.LOGenerate.clone() invokes inefficient 
Boolean constructor; use Boolean.valueOf(...) instead
Dm  org.apache.pig.impl.logicalLayer.LogicalPlan.clone() invokes 
inefficient new String(String) constructor
Dm  org.apache.pig.impl.logicalLayer.LOSort.clone() invokes inefficient 
Boolean constructor; use Boolean.valueOf(...) instead
Dm  
org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(List)
 invokes inefficient Boolean constructor; use Boolean.valueOf(...) instead
Dm  
org.apache.pig.impl.logicalLayer.RemoveRedundantOperators.visit(LOProject) 
invokes inefficient new String(String) constructor
Dm  org.apache.pig.impl.logicalLayer.schema.Schema.getField(String) invokes 
inefficient new String(String) constructor
Dm  org.apache.pig.impl.logicalLayer.schema.Schema.reconcile(Schema) 
invokes inefficient new String(String) constructor
Dm  
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertCastForEachInBetweenIfNecessary(LogicalOperator,
 LogicalOperator, Schema) invokes inefficient Boolean constructor; use 
Boolean.valueOf(...) instead]
Dm  
org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(Notification,
 Object) forces garbage collection; extremely dubious except in benchmarking 
code
Dm  org.apache.pig.pen.AugmentBaseDataVisitor.GetLargerValue(Object) 
invokes inefficient new String(String) constructor
Dm  org.apache.pig.pen.AugmentBaseDataVisitor.GetSmallerValue(Object) 
invokes inefficient new String(String) constructor
Dm  org.apache.pig.tools.cmdline.CmdLineParser.getNextOpt() invokes 
inefficient new String(String) constructor
Dm  org.apache.pig.tools.parameters.PreprocessorContext.substitute(String) 
invokes inefficient new String(String) constructor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1026) [zebra] map split returns null

2009-10-20 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1026:
--

Attachment: (was: MultipleKeyInMapSplitException.patch)

 [zebra] map split returns null
 --

 Key: PIG-1026
 URL: https://issues.apache.org/jira/browse/PIG-1026
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0


 Here is the test scenario:
  final static String STR_SCHEMA = m1:map(string),m2:map(map(int));
   //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1];
  final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1,m2];
 projection: String projection2 = new String(m1#{b}, m2#{x|z});
 User got null pointer exception on reading m1#{b}.
 Yan, please refer to the test class:
 TestNonDefaultWholeMapSplit.java 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-790) Error message should indicate in which line number in the Pig script the error occured (debugging BinCond)

2009-10-20 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12768028#action_12768028
 ] 

Daniel Dai commented on PIG-790:


Definite I can create a helper function for that if necessary. 

 Error message should indicate in which line number in the Pig script the 
 error occured (debugging BinCond)
 --

 Key: PIG-790
 URL: https://issues.apache.org/jira/browse/PIG-790
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.6.0

 Attachments: error_rerport.pig, myerrordata.txt, PIG-790-1.patch, 
 pig_1240972895275.log


 I have a simple Pig script which loads integer data and does a Bincond, where 
 it compares, (col1 eq ''). There is an error message that is generated in 
 this case, but it does not specify the line number in the script. 
 {code}
 MYDATA = load '/user/viraj/myerrordata.txt' using PigStorage() as (col1:int, 
 col2:int);
 MYDATA_PROJECT = FOREACH MYDATA GENERATE ((col1 eq '') ? 1 : 0) as newcol1,
  ((col1 neq '') ? col1 - col2 : 
 16)
 as time_diff;
 dump MYDATA_PROJECT;
 {code}
 ==
 2009-04-29 02:33:07,182 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localhost:9000
 2009-04-29 02:33:08,584 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localhost:9001
 2009-04-29 02:33:08,836 [main] INFO  org.apache.pig.PigServer - Create a new 
 graph.
 2009-04-29 02:33:10,040 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1039: Incompatible types in EqualTo Operator left hand side:int right hand 
 side:chararray
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1240972386081.log
 ==
 It would be good if the error message has a line number and a copy of the 
 line in the script which is causing the problem.
 Attaching data, script and log file. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1026) [zebra] map split returns null

2009-10-20 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1026:


Comment: was deleted

(was: Created a customer scenario with this schema and storage hint: 
(TestJira1026.java)

 final static String STR_SCHEMA = bcookie:bytes,yuid:bytes, 
ip:bytes,query_term:bytes,clickinfo:map(String),demog:map(String),page_params:map(String),viewinfo:collection(f1:map(String));
   

 final static String STR_STORAGE = 
[bcookie,yuid,ip,query_term];[clickinfo#{pos|sec|slk|targurl|cost|gpos},page_params#{ipc|vtestid|frcode|pagenum|query}];[clickinfo,page_params,demog];[viewinfo];
 
Got NullPointExcepiton.)

 [zebra] map split returns null
 --

 Key: PIG-1026
 URL: https://issues.apache.org/jira/browse/PIG-1026
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0


 Here is the test scenario:
  final static String STR_SCHEMA = m1:map(string),m2:map(map(int));
   //final static String STR_STORAGE = [m1#{a}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1];
  final static String STR_STORAGE = [m1#{a}, m2#{x}];[m2#{x|y}]; [m1#{b}, 
 m2#{z}];[m1,m2];
 projection: String projection2 = new String(m1#{b}, m2#{x|z});
 User got null pointer exception on reading m1#{b}.
 Yan, please refer to the test class:
 TestNonDefaultWholeMapSplit.java 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

2009-10-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1022:


Attachment: PIG-1022-1.patch

Attach the patch. Thanks Santhosh for helping analyze the problem.

 optimizer pushes filter before the foreach that generates column used by 
 filter
 ---

 Key: PIG-1022
 URL: https://issues.apache.org/jira/browse/PIG-1022
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Attachments: PIG-1022-1.patch


 grunt l = load 'students.txt' using PigStorage() as (name:chararray, 
 gender:chararray, age:chararray, score:chararray);
 grunt f = foreach l generate name, gender, age,score, '200'  as 
 gid:chararray;
 grunt g = group f by (name, gid);
 grunt f2 = foreach g generate group.name as name: chararray, group.gid as 
 gid: chararray;
 grunt filt = filter f2 by gid == '200';
 grunt explain filt;
 In the plan generated filt is pushed up after the load and before the first 
 foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1022) optimizer pushes filter before the foreach that generates column used by filter

2009-10-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1022:


Fix Version/s: 0.6.0
Affects Version/s: 0.4.0
   Status: Patch Available  (was: Open)

 optimizer pushes filter before the foreach that generates column used by 
 filter
 ---

 Key: PIG-1022
 URL: https://issues.apache.org/jira/browse/PIG-1022
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1022-1.patch


 grunt l = load 'students.txt' using PigStorage() as (name:chararray, 
 gender:chararray, age:chararray, score:chararray);
 grunt f = foreach l generate name, gender, age,score, '200'  as 
 gid:chararray;
 grunt g = group f by (name, gid);
 grunt f2 = foreach g generate group.name as name: chararray, group.gid as 
 gid: chararray;
 grunt filt = filter f2 by gid == '200';
 grunt explain filt;
 In the plan generated filt is pushed up after the load and before the first 
 foreach, even though the filter is on gid which is generated in first foreach.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-747) Logical to Physical Plan Translation fails when temporary alias are created within foreach

2009-10-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-747:
---

Attachment: PIG-747-1.patch

 Logical to Physical Plan Translation fails when temporary alias are created 
 within foreach
 --

 Key: PIG-747
 URL: https://issues.apache.org/jira/browse/PIG-747
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Attachments: physicalplan.txt, physicalplanprob.pig, PIG-747-1.patch


 Consider a the pig script which calculates a new column F inside the foreach 
 as:
 {code}
 A = load 'physicalplan.txt' as (col1,col2,col3);
 B = foreach A {
D = col1/col2;
E = col3/col2;
F = E - (D*D);
generate
F as newcol;
 };
 dump B;
 {code}
 This gives the following error:
 ===
 Caused by: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
  ERROR 2015: Invalid physical operators in the physical plan
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:377)
 at 
 org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:63)
 at 
 org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:29)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:908)
 at 
 org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:122)
 at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:41)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246)
 ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give 
 operator of type 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide
  multiple outputs.  This operator does not support multiple outputs.
 at 
 org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:158)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:89)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:373)
 ... 19 more
 ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-747) Logical to Physical Plan Translation fails when temporary alias are created within foreach

2009-10-20 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-747:
---

Fix Version/s: 0.6.0
Affects Version/s: (was: 0.3.0)
   0.4.0
   Status: Patch Available  (was: Open)

 Logical to Physical Plan Translation fails when temporary alias are created 
 within foreach
 --

 Key: PIG-747
 URL: https://issues.apache.org/jira/browse/PIG-747
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: physicalplan.txt, physicalplanprob.pig, PIG-747-1.patch


 Consider a the pig script which calculates a new column F inside the foreach 
 as:
 {code}
 A = load 'physicalplan.txt' as (col1,col2,col3);
 B = foreach A {
D = col1/col2;
E = col3/col2;
F = E - (D*D);
generate
F as newcol;
 };
 dump B;
 {code}
 This gives the following error:
 ===
 Caused by: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
  ERROR 2015: Invalid physical operators in the physical plan
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:377)
 at 
 org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:63)
 at 
 org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:29)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:908)
 at 
 org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:122)
 at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:41)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246)
 ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give 
 operator of type 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide
  multiple outputs.  This operator does not support multiple outputs.
 at 
 org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:158)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:89)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:373)
 ... 19 more
 ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1033) javac warnings: deprecated hadoop APIs

2009-10-20 Thread Daniel Dai (JIRA)
javac warnings: deprecated hadoop APIs
--

 Key: PIG-1033
 URL: https://issues.apache.org/jira/browse/PIG-1033
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Daniel Dai
 Fix For: 0.6.0


Suppress javac warnings related to deprecated hadoop APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-20 Thread Kevin Weil (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Weil updated PIG-1025:


Status: Patch Available  (was: Open)

Attaching updated patch.  I'm still not sure how the last patch caused so many 
errors in MultiQueryLocal, but there was one spot where I would have 
effectively been calling PigContext.setProperty(jobPriority, null) if the 
priority was not set.  I just added a null check before that call, and I no-op 
if the user never set job.priority.  The patch now passes all tests for me when 
I run ant test-commit.  Thanks Alan for manually applying the patch to test it.

 Should be able to set job priority through Pig Latin
 

 Key: PIG-1025
 URL: https://issues.apache.org/jira/browse/PIG-1025
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.4.0
Reporter: Kevin Weil
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1025.patch, PIG-1025_2.patch


 Currently users can set the job name through Pig Latin by saying
 set job.name 'my job name'
 The ability to set the priority would also be nice, and the patch should be 
 small.  The goal is to be able to say
 set job.priority 'high'
 and throw a JobCreationException in the JobControlCompiler if the priority is 
 not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
 very_low, low, normal, high, very_high.   Case insensitivity makes this a 
 little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1025) Should be able to set job priority through Pig Latin

2009-10-20 Thread Kevin Weil (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Weil updated PIG-1025:


Attachment: PIG-1025_2.patch

Updated patch with the null check.

 Should be able to set job priority through Pig Latin
 

 Key: PIG-1025
 URL: https://issues.apache.org/jira/browse/PIG-1025
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.4.0
Reporter: Kevin Weil
Priority: Minor
 Fix For: 0.6.0

 Attachments: PIG-1025.patch, PIG-1025_2.patch


 Currently users can set the job name through Pig Latin by saying
 set job.name 'my job name'
 The ability to set the priority would also be nice, and the patch should be 
 small.  The goal is to be able to say
 set job.priority 'high'
 and throw a JobCreationException in the JobControlCompiler if the priority is 
 not one of the allowed string values from the o.a.h.mapred.JobPriority enum: 
 very_low, low, normal, high, very_high.   Case insensitivity makes this a 
 little nicer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.