[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716871#action_12716871 ] Hudson commented on PIG-796: Integrated in Pig-trunk #465 (See [http://hudson.zones.apache.org/hudson/job/Pig-trunk/465/]) > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > Fix For: 0.3.0 > > Attachments: 796.patch, pig-796.patch, pig-796.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716110#action_12716110 ] Hadoop QA commented on PIG-796: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12409785/pig-796.patch against trunk revision 781599. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/70/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/70/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/70/console This message is automatically generated. > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > Attachments: 796.patch, pig-796.patch, pig-796.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715406#action_12715406 ] Hadoop QA commented on PIG-796: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12409612/796.patch against trunk revision 780722. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 225 javac compiler warnings (more than the trunk's current 224 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/66/console This message is automatically generated. > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > Attachments: 796.patch, pig-796.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715325#action_12715325 ] Pradeep Kamath commented on PIG-796: A few comments: - In TestPOCast.java the variables can be named as something like "opWithInputTypeAsByteArray" for the POCast objects since the intent is not so clear with the current names - In POCast.java you can check for the realType inside the catch clause rather than before trying to cast to ByteArray. This way, if the cast to ByteArray is always successful, we will not be incurring the overhead of the if(realType==null) check - In POCast.java, you can avoid catching ExecException and checking for errorCode == 1071. Since the getNext() call in POCast already throws ExecException, you can just let ExecExceptions from DataType.toXXX() methods bubble out. > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > Attachments: 796.patch, pig-796.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714526#action_12714526 ] Yiping Han commented on PIG-796: I have the same idea that Alan proposed. I agree the common case is most values are of the same type. Caching the type and change the cached type only when catch the ClassCastException would be the most efficient way. > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714283#action_12714283 ] Santhosh Srinivasan commented on PIG-796: - Milind, You are generalizing a specific problem. Pig can convert a byte array to an integer and then to a string as long as the byte array is convertible to an integer. The problem being discussed is for bytes that come out of a Map. The title of this jira is incorrect as I have pointed out in my first comment. Regarding ClassCastExceptions, Pig fails and the script aborts; Here, I am excluding less than a handful of cases where we do not bail out. > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714279#action_12714279 ] Milind Bhandarkar commented on PIG-796: --- Modifying my earlier comment: > So, can we live with the classcastexception generated by the front end ? I meant the back end of course. > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714278#action_12714278 ] Milind Bhandarkar commented on PIG-796: --- So, can we live with the classcastexception generated by the front end ? I recall reading somewhere that pigs do what they are told. If they are told to do things that are even impossible for humans to comprehend, i.e. somehow interpret a byte array to be an integer, and then to convert them to strings, how would they cope up ? IMHO, eliminating such implicit casts would reduce complexity of pig, and would fit in the pig philosphy. But that means being able to convert everything to a chararray at most. If someone request a chararray cast of a bytearray, give them a hex representation, and have them write a UDF to convert hex string to string (i.e. toInt('0x'+myvalue) in the above code.) thoughts ? > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714234#action_12714234 ] Santhosh Srinivasan commented on PIG-796: - Milind, This issue is in the backend. Users can do that you suggest in the front-end. > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714233#action_12714233 ] Milind Bhandarkar commented on PIG-796: --- Can't the user simply do: {code} foreach input generate (chararray)((int)mymap#'key') as myvalue; {code} Minimizing implicit casting is a good thing (tm) anyway. > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714224#action_12714224 ] Alan Gates commented on PIG-796: Can options a and b not be combined? Could we cache the type the first time, and if we see the ClassCastException then attempt to infer the type, caching whatever we see for the next time? This will benefit users who have all or most of their values of the same type, since we won't be introspecting every time. It will penalize users who's values switch frequently (as exceptions are very slow), but it will still work. I'm guessing the former is much more common than the latter. > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714212#action_12714212 ] Olga Natkovich commented on PIG-796: I think we should be safe and check type for every value > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714209#action_12714209 ] Ashutosh Chauhan commented on PIG-796: -- Since Pig allows values in a map to be of different types caching the type may not be safe. There are two possible alternatives: a) Find type by introspection every time. This will ensure we are always correct and can handle all cases (including when values in maps are of different types). This though will incur a performance overhead for every cast call. b) Find the type first time and then cache it for subsequent calls. When encountered with different type Pig will bail out with a ClassCastException. This will avoid performance overhead but Pig will die when values in maps are of different types. In this performance Vs "handling all cases" trade-off wondering which route should we go ? > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-796) support conversion from numeric types to chararray
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708281#action_12708281 ] Santhosh Srinivasan commented on PIG-796: - Pig supports conversion of numeric types to chararray. The issue being talked about in this jira is about conversion of types that come out of maps. Currently, Pig assumes that bytearray is the type of the value. However, when users use non-bytearray types, (like Double, Integer, etc.), then conversions fail. Pig is expecting a bytearray and finds a different type, resulting in ClassCastException. An example to illustrate the point. In the script below, the user is explicitly casting the value to a chararray. If the map returns an Integer or a Double, then Pig will bail out with a ClassCastException. {code} foreach input generate (chararray)mymap#'key' as myvalue; {code} A proposal to address this issue. Instead of bailing out, Pig should examine the type of the value. Depending on the expected type (implicit/explicit casts), perform a conversion. These changes are localized to the POCast.java The implication of this change is a possible slower performance that can be minimized by caching the value type. On a side note, this issue plays a role in the larger issue of handling unknown types. That is a much bigger topic. > support conversion from numeric types to chararray > --- > > Key: PIG-796 > URL: https://issues.apache.org/jira/browse/PIG-796 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.2.0 >Reporter: Olga Natkovich > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.