[jira] Updated: (PIG-712) Need utilities to create schemas for bags and tuples

2009-04-05 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-712:
---

Attachment: Pig_712_Patch_Merged.txt

I've merged the two patches into one patch,because the testcase dependent on 
the implemenation.

> Need utilities to create schemas for bags and tuples
> 
>
> Key: PIG-712
> URL: https://issues.apache.org/jira/browse/PIG-712
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: Pig_712_Patch_Merged.txt
>
>
> Pig should provide utilities to create bag and tuple schemas. Currently, 
> users return schemas in outputSchema method and end up with very verbose 
> boiler plate code. It will be very nice if Pig encapsulates the boiler plate 
> code in utility methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-712) Need utilities to create schemas for bags and tuples

2009-04-05 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-712:
---

Attachment: (was: Pig_712_Patch_TestCase.txt)

> Need utilities to create schemas for bags and tuples
> 
>
> Key: PIG-712
> URL: https://issues.apache.org/jira/browse/PIG-712
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.3.0
>
>
> Pig should provide utilities to create bag and tuple schemas. Currently, 
> users return schemas in outputSchema method and end up with very verbose 
> boiler plate code. It will be very nice if Pig encapsulates the boiler plate 
> code in utility methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-712) Need utilities to create schemas for bags and tuples

2009-04-05 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-712:
---

Attachment: (was: Pig_712_Patch.txt)

> Need utilities to create schemas for bags and tuples
> 
>
> Key: PIG-712
> URL: https://issues.apache.org/jira/browse/PIG-712
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.3.0
>
>
> Pig should provide utilities to create bag and tuple schemas. Currently, 
> users return schemas in outputSchema method and end up with very verbose 
> boiler plate code. It will be very nice if Pig encapsulates the boiler plate 
> code in utility methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-737) A few unit tests take a long timeto run on windows

2009-04-05 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695980#action_12695980
 ] 

Daniel Dai commented on PIG-737:


I compare the unit test time on Unix and Cygwin. It is not a comparison between 
performance between Unix and Cygwin cuz I use different computers for Unix and 
Cygwin test, rather, I was trying to find whether a particular unit test is 
significantly slower. All unit tests which use "MiniCluster" (marked as red in 
the table below) are consistently slower under Cygwin in my test. For those 
long unit tests(>100s), the range of slowdown is about 1.29--2.23. Here is a 
list:

||Test Case||Test Time in Unix||Test Time in Cygwin||
|TestAdd|0.116|0.25|
|{color:red}TestAlgebraicEval{color}|350.899|451.75| 
|TestAlgebraicEvalLocal|23.428|15.531|
|{color:red}TestBZip{color}|35.987|46.391|
|{color:red}TestBestFitCast{color}|211.681|409|
|TestBinaryStorage|1.135|4.062|
|TestBoolean|0.212|0.141|
|TestBuiltin|1.111|0.672|
|TestCmdLineParser|0.064|0.047|
|{color:red}TestCombiner{color}|144.184|222.625|
|{color:red}TestCompressedFiles{color}|24.744|50.359|
|TestConstExpr|0.115|0.063|
|TestConversions|0.385|0.25|
|{color:red}TestCustomSlicer{color}|953.088|211.719|
|TestDataBag|1.329|1.625|
|{color:red}TestDataBagAccess{color}|138.525|261.281|
|TestDataModel|0.264|0.281|
|TestDeleteOnFail|0.072|0.172|
|TestDivide|0.093|0.031|
|TestEqualTo|0.261|0.141|
|{color:red}TestEvalPipeline{color}|608.986|1,089.17|
|{color:red}TestEvalPipeline2{color}|64.247|127.532|
|TestEvalPipelineLocal|5.351|4.422|
|{color:red}TestExampleGenerator{color}|3.169|5.578|
|{color:red}TestFRJoin{color}|413.982|667.406|
|TestFilter|0.323|0.156|
|{color:red}TestFilterOpNumeric{color}|66.538|145.922|
|{color:red}TestFilterOpString{color}|54.282|105.844|
|{color:red}TestFilterUDF{color}|17.943|40.14|
|TestFinish|16.55|25.015|
|TestForEach|0.28|0.172|
|TestForEachNestedPlan|20.252|23.312|
|TestForEachNestedPlanLocal|0.591|0.562|
|TestFuncSpec|0.225|0.141|
|TestGTOrEqual|0.264|0.156|
|TestGreaterThan|0.264|0.188|
|TestGrunt|3.083|2.343|
|TestImplicitSplit|1.427|3.453|
|{color:red}TestInfixArithmetic{color}|76.284|146.781|
|{color:red}TestInputOutputFileValidator{color}|0.717|1.984|
|TestInstantiateFunc|0.055|0.032|
|{color:red}TestJobSubmission{color}|6.646|4.641|
|TestKeyTypeDiscoveryVisitor|55.592|88.703|
|TestLTOrEqual|0.264|0.171|
|TestLessThan|0.263|0.171|
|TestLoad|0.286|0.219|
|TestLocal|1.698|2.172|
|TestLocal2|0.713|0.609|
|TestLocalJobSubmission|7.925|8.438|
|TestLocalPOSplit|0.72|1.5|
|TestLocalRearrange|0.27|0.156|
|TestLogToPhyCompiler|1.13|1.719|
|TestLogicalOptimizer|1.519|2.093|
|TestLogicalPlanBuilder|1.804|2.203|
|TestMRCompiler|0.79|0.625|
|{color:red}TestMapReduce{color}|238.474|405.843|
|TestMapReduce2|40.298|46.078|
|TestMod|0.062|0.047|
|TestMultiply|0.062|0.031|
|TestNotEqualTo|0.254|0.188|
|TestNull|0.274|0.172|
|{color:red}TestNullConstant{color}|85.932|172.828|
|TestOperatorPlan|0.236|0.187|
|TestPOBinCond|0.114|0.078|
|TestPOCast|0.302|0.235|
|TestPOCogroup|0.251|0.156|
|TestPOCross|0.227|0.125|
|TestPODistinct|0.079|0.046|
|TestPOGenerate|0.093|0.046|
|TestPOMapLookUp|0.204|0.141|
|{color:red}TestPONegative{color}|9.339|22.531|
|TestPOSort|0.312|0.235|
|TestPOUserFunc|0.291|0.265|
|TestPackage|107.555|86.578|
|TestParamSubPreproc|0.586|2.734|
|{color:red}TestParser{color}|56.527|18.078|
|TestPhyOp|0.261|0.141|
|TestPigContext|1.265|1.922|
|TestPigScriptParser|0.411|0.391|
|{color:red}TestPigServer{color}|3.56|1.391|
|TestPigSplit|0.988|1.672|
|TestProject|0.283|0.188|
|TestRegexp|0.189|0.11|
|TestSchema|0.219|0.156|
|TestSchemaParser|0.323|0.219|
|{color:red}TestSplitStore{color}|363.749|562.797|
|TestStore|0.459|0.313|
|{color:red}TestStoreOld{color}|115.725|212.735|
|TestStreaming|2.017|14.344|
|TestStreamingLocal|1.964|12.875|
|TestSubtract|0.062|0.031|
|TestTypeChecking|1.264|1.781|
|TestTypeCheckingValidator|6.584|5.625|
|TestTypeCheckingValidatorNoSchema|0.408|0.25|
|{color:red}TestUnion{color}|43.671|75.234|

Here is a decomposed list for one of the test case TestDataBagAccess;

||Test||Test Time in Unix||Test Time in Cygwin||
|testSingleTupleBagAcess|0.261|0.188|
|testNonSpillableDataBag|0.14|0.093|
|testBagConstantAccess|17.773|23.375|
|testBagConstantAccessFailure|0.194|0.438|
|testBagConstantFlatten1|15.794|17.343|
|testBagConstantFlatten2|25.736|35.594|
|testBagStoreLoad|157.403|239.734|

Based on these data, no particular long unit test is significantly slow. The 
factor of slowdown is relatively stable considering the diversity of code we 
are testing. So I think what we need to deal with is general performance 
problem under Cygwin rather than a particular unit test. Does anyone see some 
exceptions on other computers?

> A few unit tests take a long timeto run  on windows
> ---
>
> Key: PIG-737
> UR

[jira] Commented: (PIG-712) Need utilities to create schemas for bags and tuples

2009-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695979#action_12695979
 ] 

Hadoop QA commented on PIG-712:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12404692/Pig_712_Patch_TestCase.txt
  against trunk revision 759376.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 154 release audit warnings 
(more than the trunk's current 153 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/20/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/20/artifact/trunk/current/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/20/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/20/console

This message is automatically generated.

> Need utilities to create schemas for bags and tuples
> 
>
> Key: PIG-712
> URL: https://issues.apache.org/jira/browse/PIG-712
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: Pig_712_Patch.txt, Pig_712_Patch_TestCase.txt
>
>
> Pig should provide utilities to create bag and tuple schemas. Currently, 
> users return schemas in outputSchema method and end up with very verbose 
> boiler plate code. It will be very nice if Pig encapsulates the boiler plate 
> code in utility methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-712) Need utilities to create schemas for bags and tuples

2009-04-05 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-712:
---

Attachment: Pig_712_Patch_TestCase.txt
Pig_712_Patch.txt

> Need utilities to create schemas for bags and tuples
> 
>
> Key: PIG-712
> URL: https://issues.apache.org/jira/browse/PIG-712
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: Pig_712_Patch.txt, Pig_712_Patch_TestCase.txt
>
>
> Pig should provide utilities to create bag and tuple schemas. Currently, 
> users return schemas in outputSchema method and end up with very verbose 
> boiler plate code. It will be very nice if Pig encapsulates the boiler plate 
> code in utility methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-712) Need utilities to create schemas for bags and tuples

2009-04-05 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-712:
---

Status: Patch Available  (was: Open)

I submit two patches, one is the implementation, and the another is the 
testcase.
Santhosh, could you review it for me ? Thank you in advance.

And this is my first time of submitting patches, if there's any wrong with my 
process or code, please point it.


> Need utilities to create schemas for bags and tuples
> 
>
> Key: PIG-712
> URL: https://issues.apache.org/jira/browse/PIG-712
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: Pig_712_Patch.txt, Pig_712_Patch_TestCase.txt
>
>
> Pig should provide utilities to create bag and tuple schemas. Currently, 
> users return schemas in outputSchema method and end up with very verbose 
> boiler plate code. It will be very nice if Pig encapsulates the boiler plate 
> code in utility methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-745) Please add DataTypes.toString() conversion function

2009-04-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695946#action_12695946
 ] 

Hadoop QA commented on PIG-745:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12404681/PIG-745.patch
  against trunk revision 759376.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/19/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/19/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/19/console

This message is automatically generated.

> Please add DataTypes.toString() conversion function
> ---
>
> Key: PIG-745
> URL: https://issues.apache.org/jira/browse/PIG-745
> Project: Pig
>  Issue Type: Improvement
>Reporter: David Ciemiewicz
> Attachments: PIG-745.patch
>
>
> I'm doing some work in string manipulation UDFs and I've found that it would 
> be very convenient if I could always convert the argument to a chararray 
> (internally a Java String).
> For example TOLOWERCASE(arg) shouldn't really care whether arg is a 
> bytearray, chararray, int, long, double, or float, it should be treated as a 
> string and operated on.
> The simplest and most foolproof method would be if the DataTypes added a 
> static function of  DataTypes.toString which did all of the argument type 
> checking and provided consistent translation.
> I believe that this function might be coded as:
> public static String toString(Object o) throws ExecException {
> try {
>   switch (findType(o)) {
>   case BOOLEAN:
>   if (((Boolean)o) == true) return new String('1');
>   else return new String('0');
>   case BYTE:
>   return ((Byte)o).toString();
>   case INTEGER:
>   return ((Integer)o).toString();
>   case LONG:
>   return ((Long)o).toString();
>   case FLOAT:
>   return ((Float)o).toString();
>   case DOUBLE:
>   return ((Double)o).toString();
>   case BYTEARRAY:
>   return ((DataByteArray)o).toString();
>   case CHARARRAY:
>   return (String)o;
>   case NULL:
>   return null;
>   case MAP:
>   case TUPLE:
>   case BAG:
>   case UNKNOWN:
>   default:
>   int errCode = 1071;
>   String msg = "Cannot convert a " + findTypeName(o) +
>   " to an String";
>   throw new ExecException(msg, errCode, 
> PigException.INPUT);
>   }
>   } catch (ExecException ee) {
>   throw ee;
>   } catch (Exception e) {
>   int errCode = 2054;
>   String msg = "Internal error. Could not convert " + o + 
> " to String.";
>   throw new ExecException(msg, errCode, PigException.BUG);
>   }
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-754) Bugs with load and store and filenames passed with -param containing periods

2009-04-05 Thread David Ciemiewicz (JIRA)
Bugs with load and store and filenames passed with -param containing periods


 Key: PIG-754
 URL: https://issues.apache.org/jira/browse/PIG-754
 Project: Pig
  Issue Type: Bug
Reporter: David Ciemiewicz


This one drove me batty.

I have two files file and file.right.

file:
{code}
WRONG 
This is file, not file.right.
{code}

file.right:
{code}
RIGHT
This is file.right..
{code}

infile.pig:
{code}
A = load '$infile' using PigStorage();
dump A;
{code}

When I pass in file.right as the infile parameter value, the wrong file is read:

{code}
-bash-3.00$ pig -exectype local -param infile=file.right infile.pig
USING: /grid/0/gs/pig/current
2009-04-05 23:18:36,291 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-04-05 23:18:36,292 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(WRONG )
(This is file, not file.right.)
{code}

However, if I pass in infile as ./file.right, the script magically works.

{code}
-bash-3.00$ pig -exectype local -param infile=./file.right infile.pig
USING: /grid/0/gs/pig/current
2009-04-05 23:20:46,735 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-04-05 23:20:46,736 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(RIGHT)
(This is file.right.)
{code}

I do not have this problem if I use the file name with a period in the script 
itself:

infile2.pig
{code}
A = load 'file.right' using PigStorage();
dump A;
{code}

{code}
-bash-3.00$ pig -exectype local infile2.pig
USING: /grid/0/gs/pig/current
2009-04-05 23:22:47,022 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
2009-04-05 23:22:47,023 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
(RIGHT)
(This is file.right.)
{code}

I also experience similar problems when I try to pass in param outfile in a 
store statement.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-745) Please add DataTypes.toString() conversion function

2009-04-05 Thread David Ciemiewicz (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Ciemiewicz updated PIG-745:
-

Status: Patch Available  (was: Open)

PIG-745.patch adds DataType.toString() function to DataType package.

> Please add DataTypes.toString() conversion function
> ---
>
> Key: PIG-745
> URL: https://issues.apache.org/jira/browse/PIG-745
> Project: Pig
>  Issue Type: Improvement
>Reporter: David Ciemiewicz
> Attachments: PIG-745.patch
>
>
> I'm doing some work in string manipulation UDFs and I've found that it would 
> be very convenient if I could always convert the argument to a chararray 
> (internally a Java String).
> For example TOLOWERCASE(arg) shouldn't really care whether arg is a 
> bytearray, chararray, int, long, double, or float, it should be treated as a 
> string and operated on.
> The simplest and most foolproof method would be if the DataTypes added a 
> static function of  DataTypes.toString which did all of the argument type 
> checking and provided consistent translation.
> I believe that this function might be coded as:
> public static String toString(Object o) throws ExecException {
> try {
>   switch (findType(o)) {
>   case BOOLEAN:
>   if (((Boolean)o) == true) return new String('1');
>   else return new String('0');
>   case BYTE:
>   return ((Byte)o).toString();
>   case INTEGER:
>   return ((Integer)o).toString();
>   case LONG:
>   return ((Long)o).toString();
>   case FLOAT:
>   return ((Float)o).toString();
>   case DOUBLE:
>   return ((Double)o).toString();
>   case BYTEARRAY:
>   return ((DataByteArray)o).toString();
>   case CHARARRAY:
>   return (String)o;
>   case NULL:
>   return null;
>   case MAP:
>   case TUPLE:
>   case BAG:
>   case UNKNOWN:
>   default:
>   int errCode = 1071;
>   String msg = "Cannot convert a " + findTypeName(o) +
>   " to an String";
>   throw new ExecException(msg, errCode, 
> PigException.INPUT);
>   }
>   } catch (ExecException ee) {
>   throw ee;
>   } catch (Exception e) {
>   int errCode = 2054;
>   String msg = "Internal error. Could not convert " + o + 
> " to String.";
>   throw new ExecException(msg, errCode, PigException.BUG);
>   }
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-745) Please add DataTypes.toString() conversion function

2009-04-05 Thread David Ciemiewicz (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Ciemiewicz updated PIG-745:
-

Attachment: PIG-745.patch

PIG-745.patch attached.

Patch for consideration to add DataTypes.toString() function.

> Please add DataTypes.toString() conversion function
> ---
>
> Key: PIG-745
> URL: https://issues.apache.org/jira/browse/PIG-745
> Project: Pig
>  Issue Type: Improvement
>Reporter: David Ciemiewicz
> Attachments: PIG-745.patch
>
>
> I'm doing some work in string manipulation UDFs and I've found that it would 
> be very convenient if I could always convert the argument to a chararray 
> (internally a Java String).
> For example TOLOWERCASE(arg) shouldn't really care whether arg is a 
> bytearray, chararray, int, long, double, or float, it should be treated as a 
> string and operated on.
> The simplest and most foolproof method would be if the DataTypes added a 
> static function of  DataTypes.toString which did all of the argument type 
> checking and provided consistent translation.
> I believe that this function might be coded as:
> public static String toString(Object o) throws ExecException {
> try {
>   switch (findType(o)) {
>   case BOOLEAN:
>   if (((Boolean)o) == true) return new String('1');
>   else return new String('0');
>   case BYTE:
>   return ((Byte)o).toString();
>   case INTEGER:
>   return ((Integer)o).toString();
>   case LONG:
>   return ((Long)o).toString();
>   case FLOAT:
>   return ((Float)o).toString();
>   case DOUBLE:
>   return ((Double)o).toString();
>   case BYTEARRAY:
>   return ((DataByteArray)o).toString();
>   case CHARARRAY:
>   return (String)o;
>   case NULL:
>   return null;
>   case MAP:
>   case TUPLE:
>   case BAG:
>   case UNKNOWN:
>   default:
>   int errCode = 1071;
>   String msg = "Cannot convert a " + findTypeName(o) +
>   " to an String";
>   throw new ExecException(msg, errCode, 
> PigException.INPUT);
>   }
>   } catch (ExecException ee) {
>   throw ee;
>   } catch (Exception e) {
>   int errCode = 2054;
>   String msg = "Internal error. Could not convert " + o + 
> " to String.";
>   throw new ExecException(msg, errCode, PigException.BUG);
>   }
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-745) Please add DataTypes.toString() conversion function

2009-04-05 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695921#action_12695921
 ] 

David Ciemiewicz commented on PIG-745:
--

The more I think about this one, the more I realize that not having 
DataType.toString() is an oversight for the DataType package.


> Please add DataTypes.toString() conversion function
> ---
>
> Key: PIG-745
> URL: https://issues.apache.org/jira/browse/PIG-745
> Project: Pig
>  Issue Type: Improvement
>Reporter: David Ciemiewicz
>
> I'm doing some work in string manipulation UDFs and I've found that it would 
> be very convenient if I could always convert the argument to a chararray 
> (internally a Java String).
> For example TOLOWERCASE(arg) shouldn't really care whether arg is a 
> bytearray, chararray, int, long, double, or float, it should be treated as a 
> string and operated on.
> The simplest and most foolproof method would be if the DataTypes added a 
> static function of  DataTypes.toString which did all of the argument type 
> checking and provided consistent translation.
> I believe that this function might be coded as:
> public static String toString(Object o) throws ExecException {
> try {
>   switch (findType(o)) {
>   case BOOLEAN:
>   if (((Boolean)o) == true) return new String('1');
>   else return new String('0');
>   case BYTE:
>   return ((Byte)o).toString();
>   case INTEGER:
>   return ((Integer)o).toString();
>   case LONG:
>   return ((Long)o).toString();
>   case FLOAT:
>   return ((Float)o).toString();
>   case DOUBLE:
>   return ((Double)o).toString();
>   case BYTEARRAY:
>   return ((DataByteArray)o).toString();
>   case CHARARRAY:
>   return (String)o;
>   case NULL:
>   return null;
>   case MAP:
>   case TUPLE:
>   case BAG:
>   case UNKNOWN:
>   default:
>   int errCode = 1071;
>   String msg = "Cannot convert a " + findTypeName(o) +
>   " to an String";
>   throw new ExecException(msg, errCode, 
> PigException.INPUT);
>   }
>   } catch (ExecException ee) {
>   throw ee;
>   } catch (Exception e) {
>   int errCode = 2054;
>   String msg = "Internal error. Could not convert " + o + 
> " to String.";
>   throw new ExecException(msg, errCode, PigException.BUG);
>   }
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-752) bzip2 compression and local mode bugs

2009-04-05 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695918#action_12695918
 ] 

David Ciemiewicz commented on PIG-752:
--

Related to this - Pig should also read gzip compressed data files in local mode 
since HDFS provides this functionality.

> bzip2 compression and local mode bugs
> -
>
> Key: PIG-752
> URL: https://issues.apache.org/jira/browse/PIG-752
> Project: Pig
>  Issue Type: Bug
>Reporter: David Ciemiewicz
>
> Problem 1)  use of .bz2 file extension does not store results bzip2 
> compressed in Local mode (-exectype local)
> If I use the .bz2 filename extension in a STORE statement on HDFS, the 
> results are stored with bzip2 compression.
> If I use the .bz2 filename extension in a STORE statement on local file 
> system, the results are NOT stored with bzip2 compression.
> compact.bz2.pig:
> {code}
> A = load 'events.test' using PigStorage();
> store A into 'events.test.bz2' using PigStorage();
> C = load 'events.test.bz2' using PigStorage();
> C = limit C 10;
> dump C;
> {code}
> {code}
> -bash-3.00$ pig -exectype local compact.bz2.pig
> -bash-3.00$ file events.test
> events.test: ASCII English text, with very long lines
> -bash-3.00$ file events.test.bz2
> events.test.bz2: ASCII English text, with very long lines
> -bash-3.00$ cat events.test | bzip2 > events.test.bz2
> -bash-3.00$ file events.test.bz2
> events.test.bz2: bzip2 compressed data, block size = 900k
> {code}
> The output format in local mode is definitely not bzip2, but it should be.
> {code}
> Problem 2) pig in local mode does not decompress bzip2 compressed files, but 
> should, to be consistent with HDFS
> read.bz2.pig:
> {code}
> A = load 'events.test.bz2' using PigStorage();
> A = limit A 10;
> dump A;
> {code}
> The output should be human readable but is instead garbage, indicating no 
> decompression took place during the load:
> {code}
> -bash-3.00$ pig -exectype local read.bz2.pig
> USING: /grid/0/gs/pig/current
> 2009-04-03 18:26:30,455 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2009-04-03 18:26:30,456 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (BZh91AY&syoz?u?...@{x_?d?|u-??mK???;??4?C??)
> ((R? 6?*m?&???g, 
> ?6?Zj?k,???0?QT?d???hY?#mJ?>[j???z?m?t?u?K)??K5+??)?m?E7j?X?8a??
> ??U?p@@MT?$?B?P??N??=???(z<}gk...@c$\??i]?g:?J)
> a(R?,?u?v???...@?i@??J??!D?)???A?PP?IY??m?
> (mP(i?4,#F[?I)@>?...@??|7^?}U??wwg,?u?$?T???((Q!D?=`*?}hP??_|??=?(??2???m=?xG?(?rC?B?(33??:4?N???t|??T?*??k??NT?x???=?fyv?w>f??4z???4t?)
> (?oou?t???Kwl?3?nCM?WS?;l???P?s?x
> a???e)B??9?  ?44
> ((?...@4?)
> (f)
> (?...@+?d?0@>?U)
> (Q?SR)
> -bash-3.00$ 
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.